[jira] [Commented] (HDDS-1896) Suppress WARN log from NetworkTopology#getDistanceCost

2019-08-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899568#comment-16899568
 ] 

Hudson commented on HDDS-1896:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17037 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17037/])
HDDS-1896. Suppress WARN log from NetworkTopology#getDistanceCost. (bharat: rev 
065cbc6b5460427e24c7cb4eaa9538c080f35616)
* (edit) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/net/NetworkTopologyImpl.java


> Suppress WARN log from NetworkTopology#getDistanceCost 
> ---
>
> Key: HDDS-1896
> URL: https://issues.apache.org/jira/browse/HDDS-1896
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When RackAwareness is enabled and client from outside, the distance 
> calculation flood SCM log with the following messages. This ticket is opened 
> to suppress the WARN log.
> {code}
> 2019-08-01 23:08:05,011 WARN org.apache.hadoop.hdds.scm.net.NetworkTopology: 
> One of the nodes is outside of network topology
> 2019-08-01 23:08:05,011 WARN org.apache.hadoop.hdds.scm.net.NetworkTopology: 
> One of the nodes is outside of network topology
> 2019-08-01 23:08:05,011 WARN org.apache.hadoop.hdds.scm.net.NetworkTopology: 
> One of the nodes is outside of network topology
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-03 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899565#comment-16899565
 ] 

Lisheng Sun commented on HDFS-14313:


ping [~linyiqun] Could you mind taking a review for this patch? Thank you.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14513) FSImage which is saving should be clean while NameNode shutdown

2019-08-03 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899564#comment-16899564
 ] 

He Xiaoqiao commented on HDFS-14513:


[~xkrogen] Thanks for tracing this issue, totally agree to backport, I would 
like to attach another patch for branch-2. Please let me know if we need merge 
to other versions. Thanks again.

> FSImage which is saving should be clean while NameNode shutdown
> ---
>
> Key: HDFS-14513
> URL: https://issues.apache.org/jira/browse/HDFS-14513
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14513.001.patch, HDFS-14513.002.patch, 
> HDFS-14513.003.patch, HDFS-14513.004.patch, HDFS-14513.005.patch, 
> HDFS-14513.006.patch, HDFS-14513.007.patch
>
>
> Checkpointer/FSImageSaver is regular tasks and dump NameNode meta to disk, at 
> most per hour by default. If it receive some command (e.g. transition to 
> active in HA mode) it will cancel checkpoint and delete tmp files using 
> {{FSImage#deleteCancelledCheckpoint}}. However if NameNode shutdown when 
> checkpoint, the tmp files will not be cleaned anymore. 
> Consider there are 500m inodes+blocks, it could cost 5~10min to finish once 
> checkpoint, if we shutdown NameNode during checkpointing, fsimage checkpoint 
> file will never be cleaned, after long time, there could be many useless 
> checkpoint files. So I propose that we should add hook to clean that when 
> shutdown.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1565) Rename k8s-dev and k8s-dev-push profiles to docker and docker-push

2019-08-03 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-1565:
-
Labels:   (was: pull-request-available)

> Rename k8s-dev and k8s-dev-push profiles to docker and docker-push
> --
>
> Key: HDDS-1565
> URL: https://issues.apache.org/jira/browse/HDDS-1565
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Based on the feedback from [~eyang] I realized that the names of the k8s-dev 
> and k8s-dev-push profiles are not expressive enough as the created containers 
> can be used not only for kubernetes but can be used together with any other 
> container orchestrator.
> I propose to rename them to docker/docker-push.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1896) Suppress WARN log from NetworkTopology#getDistanceCost

2019-08-03 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-1896.
--
   Resolution: Fixed
Fix Version/s: 0.5.0

> Suppress WARN log from NetworkTopology#getDistanceCost 
> ---
>
> Key: HDDS-1896
> URL: https://issues.apache.org/jira/browse/HDDS-1896
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When RackAwareness is enabled and client from outside, the distance 
> calculation flood SCM log with the following messages. This ticket is opened 
> to suppress the WARN log.
> {code}
> 2019-08-01 23:08:05,011 WARN org.apache.hadoop.hdds.scm.net.NetworkTopology: 
> One of the nodes is outside of network topology
> 2019-08-01 23:08:05,011 WARN org.apache.hadoop.hdds.scm.net.NetworkTopology: 
> One of the nodes is outside of network topology
> 2019-08-01 23:08:05,011 WARN org.apache.hadoop.hdds.scm.net.NetworkTopology: 
> One of the nodes is outside of network topology
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1896) Suppress WARN log from NetworkTopology#getDistanceCost

2019-08-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1896?focusedWorklogId=288515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288515
 ]

ASF GitHub Bot logged work on HDDS-1896:


Author: ASF GitHub Bot
Created on: 04/Aug/19 05:33
Start Date: 04/Aug/19 05:33
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #1214: 
HDDS-1896. Suppress WARN log from NetworkTopology#getDistanceCost. Co…
URL: https://github.com/apache/hadoop/pull/1214
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 288515)
Time Spent: 40m  (was: 0.5h)

> Suppress WARN log from NetworkTopology#getDistanceCost 
> ---
>
> Key: HDDS-1896
> URL: https://issues.apache.org/jira/browse/HDDS-1896
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When RackAwareness is enabled and client from outside, the distance 
> calculation flood SCM log with the following messages. This ticket is opened 
> to suppress the WARN log.
> {code}
> 2019-08-01 23:08:05,011 WARN org.apache.hadoop.hdds.scm.net.NetworkTopology: 
> One of the nodes is outside of network topology
> 2019-08-01 23:08:05,011 WARN org.apache.hadoop.hdds.scm.net.NetworkTopology: 
> One of the nodes is outside of network topology
> 2019-08-01 23:08:05,011 WARN org.apache.hadoop.hdds.scm.net.NetworkTopology: 
> One of the nodes is outside of network topology
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1896) Suppress WARN log from NetworkTopology#getDistanceCost

2019-08-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1896?focusedWorklogId=288514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288514
 ]

ASF GitHub Bot logged work on HDDS-1896:


Author: ASF GitHub Bot
Created on: 04/Aug/19 05:32
Start Date: 04/Aug/19 05:32
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on issue #1214: HDDS-1896. 
Suppress WARN log from NetworkTopology#getDistanceCost. Co…
URL: https://github.com/apache/hadoop/pull/1214#issuecomment-517975361
 
 
   Test failures are not related to this patch.
   Thank You @xiaoyuyao for the fix.
   I have committed this to the trunk.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 288514)
Time Spent: 0.5h  (was: 20m)

> Suppress WARN log from NetworkTopology#getDistanceCost 
> ---
>
> Key: HDDS-1896
> URL: https://issues.apache.org/jira/browse/HDDS-1896
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When RackAwareness is enabled and client from outside, the distance 
> calculation flood SCM log with the following messages. This ticket is opened 
> to suppress the WARN log.
> {code}
> 2019-08-01 23:08:05,011 WARN org.apache.hadoop.hdds.scm.net.NetworkTopology: 
> One of the nodes is outside of network topology
> 2019-08-01 23:08:05,011 WARN org.apache.hadoop.hdds.scm.net.NetworkTopology: 
> One of the nodes is outside of network topology
> 2019-08-01 23:08:05,011 WARN org.apache.hadoop.hdds.scm.net.NetworkTopology: 
> One of the nodes is outside of network topology
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1829) On OM reload/restart OmMetrics#numKeys should be updated

2019-08-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1829?focusedWorklogId=288512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288512
 ]

ASF GitHub Bot logged work on HDDS-1829:


Author: ASF GitHub Bot
Created on: 04/Aug/19 05:31
Start Date: 04/Aug/19 05:31
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on issue #1187: HDDS-1829 On 
OM reload/restart OmMetrics#numKeys should be updated
URL: https://github.com/apache/hadoop/pull/1187#issuecomment-517975267
 
 
   Test failures mostly unrelated to the patch.
   Will see one more CI run.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 288512)
Time Spent: 4.5h  (was: 4h 20m)

> On OM reload/restart OmMetrics#numKeys should be updated
> 
>
> Key: HDDS-1829
> URL: https://issues.apache.org/jira/browse/HDDS-1829
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When OM is restarted or the state is reloaded, OM Metrics is re-initialized. 
> The saved numKeys value might not be valid as the DB state could have 
> changed. Hence, the numKeys metric must be updated with the correct value on 
> metrics re-initialization.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1829) On OM reload/restart OmMetrics#numKeys should be updated

2019-08-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1829?focusedWorklogId=288513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288513
 ]

ASF GitHub Bot logged work on HDDS-1829:


Author: ASF GitHub Bot
Created on: 04/Aug/19 05:31
Start Date: 04/Aug/19 05:31
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on issue #1187: HDDS-1829 On 
OM reload/restart OmMetrics#numKeys should be updated
URL: https://github.com/apache/hadoop/pull/1187#issuecomment-517975274
 
 
   /retest
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 288513)
Time Spent: 4h 40m  (was: 4.5h)

> On OM reload/restart OmMetrics#numKeys should be updated
> 
>
> Key: HDDS-1829
> URL: https://issues.apache.org/jira/browse/HDDS-1829
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> When OM is restarted or the state is reloaded, OM Metrics is re-initialized. 
> The saved numKeys value might not be valid as the DB state could have 
> changed. Hence, the numKeys metric must be updated with the correct value on 
> metrics re-initialization.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1870) ConcurrentModification at PrometheusMetricsSink

2019-08-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899563#comment-16899563
 ] 

Hudson commented on HDDS-1870:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17036 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17036/])
HDDS-1870. ConcurrentModification at PrometheusMetricsSink (#1179) (bharat: rev 
f4df97fd899eaf0a2e6829827dc905664a8c)
* (edit) 
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/PrometheusMetricsSink.java


> ConcurrentModification at PrometheusMetricsSink
> ---
>
> Key: HDDS-1870
> URL: https://issues.apache.org/jira/browse/HDDS-1870
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Encountered on {{ozoneperf}} compose env when running low on CPU:
> {code}
> om_1  | java.util.ConcurrentModificationException
> om_1  |   at 
> java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1493)
> om_1  |   at 
> java.base/java.util.HashMap$ValueIterator.next(HashMap.java:1521)
> om_1  |   at 
> org.apache.hadoop.hdds.server.PrometheusMetricsSink.writeMetrics(PrometheusMetricsSink.java:123)
> om_1  |   at 
> org.apache.hadoop.hdds.server.PrometheusServlet.doGet(PrometheusServlet.java:43)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1870) ConcurrentModification at PrometheusMetricsSink

2019-08-03 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-1870:
-
   Resolution: Fixed
Fix Version/s: 0.5.0
   Status: Resolved  (was: Patch Available)

> ConcurrentModification at PrometheusMetricsSink
> ---
>
> Key: HDDS-1870
> URL: https://issues.apache.org/jira/browse/HDDS-1870
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Encountered on {{ozoneperf}} compose env when running low on CPU:
> {code}
> om_1  | java.util.ConcurrentModificationException
> om_1  |   at 
> java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1493)
> om_1  |   at 
> java.base/java.util.HashMap$ValueIterator.next(HashMap.java:1521)
> om_1  |   at 
> org.apache.hadoop.hdds.server.PrometheusMetricsSink.writeMetrics(PrometheusMetricsSink.java:123)
> om_1  |   at 
> org.apache.hadoop.hdds.server.PrometheusServlet.doGet(PrometheusServlet.java:43)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1870) ConcurrentModification at PrometheusMetricsSink

2019-08-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1870?focusedWorklogId=288510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288510
 ]

ASF GitHub Bot logged work on HDDS-1870:


Author: ASF GitHub Bot
Created on: 04/Aug/19 05:26
Start Date: 04/Aug/19 05:26
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on issue #1179: HDDS-1870. 
ConcurrentModification at PrometheusMetricsSink
URL: https://github.com/apache/hadoop/pull/1179#issuecomment-517975031
 
 
   Thank You @adoroszlai for the contribution.
   I have committed this to the trunk.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 288510)
Time Spent: 1h 10m  (was: 1h)

> ConcurrentModification at PrometheusMetricsSink
> ---
>
> Key: HDDS-1870
> URL: https://issues.apache.org/jira/browse/HDDS-1870
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Encountered on {{ozoneperf}} compose env when running low on CPU:
> {code}
> om_1  | java.util.ConcurrentModificationException
> om_1  |   at 
> java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1493)
> om_1  |   at 
> java.base/java.util.HashMap$ValueIterator.next(HashMap.java:1521)
> om_1  |   at 
> org.apache.hadoop.hdds.server.PrometheusMetricsSink.writeMetrics(PrometheusMetricsSink.java:123)
> om_1  |   at 
> org.apache.hadoop.hdds.server.PrometheusServlet.doGet(PrometheusServlet.java:43)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1870) ConcurrentModification at PrometheusMetricsSink

2019-08-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1870?focusedWorklogId=288511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288511
 ]

ASF GitHub Bot logged work on HDDS-1870:


Author: ASF GitHub Bot
Created on: 04/Aug/19 05:26
Start Date: 04/Aug/19 05:26
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #1179: 
HDDS-1870. ConcurrentModification at PrometheusMetricsSink
URL: https://github.com/apache/hadoop/pull/1179
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 288511)
Time Spent: 1h 20m  (was: 1h 10m)

> ConcurrentModification at PrometheusMetricsSink
> ---
>
> Key: HDDS-1870
> URL: https://issues.apache.org/jira/browse/HDDS-1870
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Encountered on {{ozoneperf}} compose env when running low on CPU:
> {code}
> om_1  | java.util.ConcurrentModificationException
> om_1  |   at 
> java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1493)
> om_1  |   at 
> java.base/java.util.HashMap$ValueIterator.next(HashMap.java:1521)
> om_1  |   at 
> org.apache.hadoop.hdds.server.PrometheusMetricsSink.writeMetrics(PrometheusMetricsSink.java:123)
> om_1  |   at 
> org.apache.hadoop.hdds.server.PrometheusServlet.doGet(PrometheusServlet.java:43)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1488) Scm cli command to start/stop replication manager

2019-08-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1488?focusedWorklogId=288509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288509
 ]

ASF GitHub Bot logged work on HDDS-1488:


Author: ASF GitHub Bot
Created on: 04/Aug/19 05:19
Start Date: 04/Aug/19 05:19
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #1221: 
HDDS-1488. Scm cli command to start/stop replication manager.
URL: https://github.com/apache/hadoop/pull/1221#discussion_r310370527
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java
 ##
 @@ -469,6 +469,27 @@ public boolean forceExitSafeMode() throws IOException {
 return scm.exitSafeMode();
   }
 
+  @Override
+  public void startReplicationManager() {
+AUDIT.logWriteSuccess(buildAuditMessageForSuccess(
+SCMAction.START_REPLICATION_MANAGER, null));
+scm.getReplicationManager().start();
+  }
+
+  @Override
+  public void stopReplicationManager() {
+AUDIT.logWriteSuccess(buildAuditMessageForSuccess(
+SCMAction.STOP_REPLICATION_MANAGER, null));
+scm.getReplicationManager().stop();
 
 Review comment:
   Shall we want to return a boolean flag which shows whether the stop is a 
success or not?
   Like case calling stop on already stopped replication monitor, we can print 
like Replication Monitor is not running, to stop it(Or some message like this). 
   
   Same applicable for start also.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 288509)
Time Spent: 0.5h  (was: 20m)

> Scm cli command to start/stop replication manager
> -
>
> Key: HDDS-1488
> URL: https://issues.apache.org/jira/browse/HDDS-1488
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It would be nice to have scmcli command to start/stop the ReplicationManager 
> thread running in SCM



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1902) Fix checkstyle issues in ContainerStateMachine

2019-08-03 Thread Doroszlai, Attila (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899557#comment-16899557
 ] 

Doroszlai, Attila commented on HDDS-1902:
-

Hi [~nandakumar131], I think this is already fixed in HDDS-1878.

> Fix checkstyle issues in ContainerStateMachine
> --
>
> Key: HDDS-1902
> URL: https://issues.apache.org/jira/browse/HDDS-1902
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Minor
>
> Fix checkstyle issues in ContainerStateMachine:
> Line is longer than 80 characters (found 85).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11393) Hadoop KMS contacted by jobs which don’t use KMS encryption

2019-08-03 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899530#comment-16899530
 ] 

Wei-Chiu Chuang commented on HDFS-11393:


I think this is addressed by HADOOP-16350

> Hadoop KMS contacted by jobs which don’t use  KMS encryption
> 
>
> Key: HDFS-11393
> URL: https://issues.apache.org/jira/browse/HDFS-11393
> Project: Hadoop HDFS
>  Issue Type: Wish
> Environment: Hadoop 2.7.3, Spark 1.6.3 on Yarn, Oozie 4.2.3
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Priority: Minor
>
> Hello,
> After few days of usage of Hadoop KMS in our pre-production platform, it was 
> noticed that after restarting resourcemanagers, all Yarn jobs generated on 
> the platform interrogated the KMS server, even if the didn't process 
> encrypted information. 
> {noformat}
> 2016-11-23 10:58:47,708 DEBUG AuthenticationFilter - Request 
> [http://uabigkms01:16000/kms/v1/?op=GETDELEGATIONTOKEN=rm%2Fuabigrm01%40SANDBOX.HADOOP]
>  triggering authentication
> 2016-11-23 10:58:47,735 DEBUG AuthenticationFilter - Request 
> [http://uabigkms01:16000/kms/v1/?op=GETDELEGATIONTOKEN=rm%2Fuabigrm01%40SANDBOX.HADOOP]
>  user  authenticated
> {noformat}
> Indeed after research we see that KMS supports delegation token to 
> authenticate to the Java KeyProvider by processes without Kerberos 
> credentials.
> Is there a way to bypass Delegation Token on KMS and just contact KMS when 
> jobs or user into HDFS use encrypted data ?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13483) Crypto command should give proper exception when user is trying to create an EZ with the same key with which it is already created

2019-08-03 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-13483.

Resolution: Duplicate

> Crypto command should give proper exception when user is trying to create an 
> EZ with the same key with which it is already created
> --
>
> Key: HDFS-13483
> URL: https://issues.apache.org/jira/browse/HDFS-13483
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, kms
>Affects Versions: 2.8.3
>Reporter: Harshakiran Reddy
>Assignee: Ranith Sardar
>Priority: Major
>
> {{Scenario:}}
>  # Create a Dir
>  # Create EZ for the above dir with Key1
>  # Again you can try to create ZONE for same directory with the same Key1
> {noformat}
> hadoopclient> hadoop key list
> Listing keys for KeyProvider: 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider@152aa092
> key2
> key1
> hadoopclient> hdfs dfs -mkdir /kms
> hadoopclient> hdfs crypto -createZone -keyName key1 -path /kms
> Added encryption zone /kms
> hadoopclient> hdfs crypto -createZone -keyName key1 -path /kms
> RemoteException: Attempt to create an encryption zone for a non-empty 
> directory.{noformat}
> Actual Output:
>  ===
>  {{RemoteException:Attempt to create an encryption zone for non-empty 
> directory}}
> Expected Output:
>  =
>  Exception should be like {{EZ is already created with the same key}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-08-03 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899527#comment-16899527
 ] 

Wei-Chiu Chuang commented on HDFS-12914:


One of the test failure in my branch-2 backport seems legit. 
TestSafeMode.testInitializeReplQueuesEarly timed out consecutively in my local 
tree. I'm going to look at this further.

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, 
> HDFS-12914.009.patch, HDFS-12914.branch-2.000.patch, 
> HDFS-12914.branch-2.001.patch, HDFS-12914.branch-2.002.patch, 
> HDFS-12914.branch-2.patch, HDFS-12914.branch-3.0.patch, 
> HDFS-12914.branch-3.1.001.patch, HDFS-12914.branch-3.1.002.patch, 
> HDFS-12914.branch-3.2.patch, HDFS-12914.utfix.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14452) Make Op valueOf Public

2019-08-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899507#comment-16899507
 ] 

Hadoop QA commented on HDFS-14452:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
52s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 55m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14452 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976621/HDFS-14452.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux eb99d403f839 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8f40856 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27393/testReport/ |
| Max. process+thread count | 339 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: 
hadoop-hdfs-project/hadoop-hdfs-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27393/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Make Op valueOf Public
> 

[jira] [Commented] (HDDS-1788) Fix kerberos principal error in Ozone Recon

2019-08-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899504#comment-16899504
 ] 

Hudson commented on HDDS-1788:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17035 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17035/])
HDDS-1788. Fix kerberos principal error in Ozone Recon. (#1201) (bharat: rev 
ec1d453846ca7446b5372b11372311b65bef8a4b)
* (edit) hadoop-ozone/dist/src/main/compose/ozonesecure/docker-compose.yaml
* (edit) hadoop-ozone/dist/src/main/compose/ozonesecure/docker-config
* (edit) hadoop-hdds/common/src/main/resources/ozone-default.xml
* (delete) 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/OzoneConfigurationProvider.java
* (edit) 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/OzoneManagerServiceProviderImpl.java
* (add) 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/ConfigurationProvider.java
* (edit) 
hadoop-ozone/dist/src/main/compose/ozonesecure/docker-image/docker-krb5/Dockerfile-krb5
* (edit) 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/ReconServer.java
* (edit) 
hadoop-ozone/dist/src/main/compose/ozonesecure-mr/docker-image/docker-krb5/Dockerfile-krb5
* (edit) 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/ReconControllerModule.java


> Fix kerberos principal error in Ozone Recon
> ---
>
> Key: HDDS-1788
> URL: https://issues.apache.org/jira/browse/HDDS-1788
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Affects Versions: 0.4.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Recon fails to startup in a kerberized cluster with the following error:
> {code:java}
> Failed startup of context 
> o.e.j.w.WebAppContext@2009f9b0{/,file:///tmp/jetty-0.0.0.0-9888-recon-_-any-2565178148822292652.dir/webapp/,UNAVAILABLE}{/recon}
>  javax.servlet.ServletException: javax.servlet.ServletException: Principal 
> not defined in configuration at 
> org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.init(KerberosAuthenticationHandler.java:188)
>  at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.initializeAuthHandler(AuthenticationFilter.java:194)
>  at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:180)
>  at org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:139) 
> at 
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:873) 
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:349)
>  at 
> org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1406) 
> at 
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1368) 
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:778)
>  at 
> org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:262)
>  at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:522) at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>  at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
>  at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)
>  at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
>  at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>  at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
>  at org.eclipse.jetty.server.Server.start(Server.java:427) at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
>  at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
>  at org.eclipse.jetty.server.Server.doStart(Server.java:394) at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>  at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1140) at 
> org.apache.hadoop.hdds.server.BaseHttpServer.start(BaseHttpServer.java:175) 
> at org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:102) at 
> org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:50) at 
> picocli.CommandLine.execute(CommandLine.java:1173) at 
> picocli.CommandLine.access$800(CommandLine.java:141) at 
> picocli.CommandLine$RunLast.handle(CommandLine.java:1367) at 
> picocli.CommandLine$RunLast.handle(CommandLine.java:1335) at 
> 

[jira] [Resolved] (HDDS-1788) Fix kerberos principal error in Ozone Recon

2019-08-03 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-1788.
--
   Resolution: Fixed
Fix Version/s: 0.5.0

> Fix kerberos principal error in Ozone Recon
> ---
>
> Key: HDDS-1788
> URL: https://issues.apache.org/jira/browse/HDDS-1788
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Affects Versions: 0.4.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Recon fails to startup in a kerberized cluster with the following error:
> {code:java}
> Failed startup of context 
> o.e.j.w.WebAppContext@2009f9b0{/,file:///tmp/jetty-0.0.0.0-9888-recon-_-any-2565178148822292652.dir/webapp/,UNAVAILABLE}{/recon}
>  javax.servlet.ServletException: javax.servlet.ServletException: Principal 
> not defined in configuration at 
> org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.init(KerberosAuthenticationHandler.java:188)
>  at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.initializeAuthHandler(AuthenticationFilter.java:194)
>  at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:180)
>  at org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:139) 
> at 
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:873) 
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:349)
>  at 
> org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1406) 
> at 
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1368) 
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:778)
>  at 
> org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:262)
>  at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:522) at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>  at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
>  at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)
>  at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
>  at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>  at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
>  at org.eclipse.jetty.server.Server.start(Server.java:427) at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
>  at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
>  at org.eclipse.jetty.server.Server.doStart(Server.java:394) at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>  at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1140) at 
> org.apache.hadoop.hdds.server.BaseHttpServer.start(BaseHttpServer.java:175) 
> at org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:102) at 
> org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:50) at 
> picocli.CommandLine.execute(CommandLine.java:1173) at 
> picocli.CommandLine.access$800(CommandLine.java:141) at 
> picocli.CommandLine$RunLast.handle(CommandLine.java:1367) at 
> picocli.CommandLine$RunLast.handle(CommandLine.java:1335) at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
>  at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) at 
> picocli.CommandLine.parseWithHandler(CommandLine.java:1465) at 
> org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) at 
> org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) at 
> org.apache.hadoop.ozone.recon.ReconServer.main(ReconServer.java:61)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1788) Fix kerberos principal error in Ozone Recon

2019-08-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1788?focusedWorklogId=288458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288458
 ]

ASF GitHub Bot logged work on HDDS-1788:


Author: ASF GitHub Bot
Created on: 03/Aug/19 17:49
Start Date: 03/Aug/19 17:49
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on issue #1201: HDDS-1788. Fix 
kerberos principal error in Ozone Recon
URL: https://github.com/apache/hadoop/pull/1201#issuecomment-517942704
 
 
   Thank You @vivekratnavel for the contribution.
   I have committed this to the trunk.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 288458)
Time Spent: 2h 40m  (was: 2.5h)

> Fix kerberos principal error in Ozone Recon
> ---
>
> Key: HDDS-1788
> URL: https://issues.apache.org/jira/browse/HDDS-1788
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Affects Versions: 0.4.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Recon fails to startup in a kerberized cluster with the following error:
> {code:java}
> Failed startup of context 
> o.e.j.w.WebAppContext@2009f9b0{/,file:///tmp/jetty-0.0.0.0-9888-recon-_-any-2565178148822292652.dir/webapp/,UNAVAILABLE}{/recon}
>  javax.servlet.ServletException: javax.servlet.ServletException: Principal 
> not defined in configuration at 
> org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.init(KerberosAuthenticationHandler.java:188)
>  at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.initializeAuthHandler(AuthenticationFilter.java:194)
>  at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:180)
>  at org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:139) 
> at 
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:873) 
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:349)
>  at 
> org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1406) 
> at 
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1368) 
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:778)
>  at 
> org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:262)
>  at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:522) at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>  at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
>  at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)
>  at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
>  at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>  at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
>  at org.eclipse.jetty.server.Server.start(Server.java:427) at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
>  at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
>  at org.eclipse.jetty.server.Server.doStart(Server.java:394) at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>  at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1140) at 
> org.apache.hadoop.hdds.server.BaseHttpServer.start(BaseHttpServer.java:175) 
> at org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:102) at 
> org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:50) at 
> picocli.CommandLine.execute(CommandLine.java:1173) at 
> picocli.CommandLine.access$800(CommandLine.java:141) at 
> picocli.CommandLine$RunLast.handle(CommandLine.java:1367) at 
> picocli.CommandLine$RunLast.handle(CommandLine.java:1335) at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
>  at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) at 
> picocli.CommandLine.parseWithHandler(CommandLine.java:1465) at 
> org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) at 
> 

[jira] [Work logged] (HDDS-1788) Fix kerberos principal error in Ozone Recon

2019-08-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1788?focusedWorklogId=288457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288457
 ]

ASF GitHub Bot logged work on HDDS-1788:


Author: ASF GitHub Bot
Created on: 03/Aug/19 17:49
Start Date: 03/Aug/19 17:49
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #1201: 
HDDS-1788. Fix kerberos principal error in Ozone Recon
URL: https://github.com/apache/hadoop/pull/1201
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 288457)
Time Spent: 2.5h  (was: 2h 20m)

> Fix kerberos principal error in Ozone Recon
> ---
>
> Key: HDDS-1788
> URL: https://issues.apache.org/jira/browse/HDDS-1788
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Affects Versions: 0.4.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Recon fails to startup in a kerberized cluster with the following error:
> {code:java}
> Failed startup of context 
> o.e.j.w.WebAppContext@2009f9b0{/,file:///tmp/jetty-0.0.0.0-9888-recon-_-any-2565178148822292652.dir/webapp/,UNAVAILABLE}{/recon}
>  javax.servlet.ServletException: javax.servlet.ServletException: Principal 
> not defined in configuration at 
> org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.init(KerberosAuthenticationHandler.java:188)
>  at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.initializeAuthHandler(AuthenticationFilter.java:194)
>  at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:180)
>  at org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:139) 
> at 
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:873) 
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:349)
>  at 
> org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1406) 
> at 
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1368) 
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:778)
>  at 
> org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:262)
>  at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:522) at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>  at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
>  at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)
>  at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
>  at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>  at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
>  at org.eclipse.jetty.server.Server.start(Server.java:427) at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
>  at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
>  at org.eclipse.jetty.server.Server.doStart(Server.java:394) at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>  at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1140) at 
> org.apache.hadoop.hdds.server.BaseHttpServer.start(BaseHttpServer.java:175) 
> at org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:102) at 
> org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:50) at 
> picocli.CommandLine.execute(CommandLine.java:1173) at 
> picocli.CommandLine.access$800(CommandLine.java:141) at 
> picocli.CommandLine$RunLast.handle(CommandLine.java:1367) at 
> picocli.CommandLine$RunLast.handle(CommandLine.java:1335) at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
>  at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) at 
> picocli.CommandLine.parseWithHandler(CommandLine.java:1465) at 
> org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) at 
> org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) at 
> 

[jira] [Commented] (HDFS-13270) RBF: Router audit logger

2019-08-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899495#comment-16899495
 ] 

Hadoop QA commented on HDFS-13270:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-13270 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-13270 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976620/HDFS-13270.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27392/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> RBF: Router audit logger
> 
>
> Key: HDFS-13270
> URL: https://issues.apache.org/jira/browse/HDFS-13270
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: maobaolong
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-13270.001.patch
>
>
> We can use router auditlogger to log the client info and cmd, because the 
> FSNamesystem#Auditlogger's log think the client are all from router.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14452) Make Op valueOf Public

2019-08-03 Thread hemanthboyina (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14452:
-
Attachment: HDFS-14452.patch
Status: Patch Available  (was: Open)

> Make Op valueOf Public
> --
>
> Key: HDFS-14452
> URL: https://issues.apache.org/jira/browse/HDFS-14452
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: hemanthboyina
>Priority: Minor
>  Labels: noob
> Attachments: HDFS-14452.patch
>
>
> Change signature of {{private static Op valueOf(byte code)}} to be public.  
> Right now, the only easy way to look up in Op is to pass in a {{DataInput}} 
> object, which is not all that flexible and efficient for other custom 
> implementations that want to store the Op code a different way.
> https://github.com/apache/hadoop/blob/8c95cb9d6bef369fef6a8364f0c0764eba90e44a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Op.java#L53



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13270) RBF: Router audit logger

2019-08-03 Thread hemanthboyina (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-13270:
-
Attachment: HDFS-13270.001.patch
Status: Patch Available  (was: Open)

> RBF: Router audit logger
> 
>
> Key: HDFS-13270
> URL: https://issues.apache.org/jira/browse/HDFS-13270
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: maobaolong
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-13270.001.patch
>
>
> We can use router auditlogger to log the client info and cmd, because the 
> FSNamesystem#Auditlogger's log think the client are all from router.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14616) Add the warn log when the volume available space isn't enough

2019-08-03 Thread Ayush Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899483#comment-16899483
 ] 

Ayush Saxena commented on HDFS-14616:
-

Thanx [~alexking_lee] for the patch. For adding log, there is no need fo UT. 
Just the LOG part is enough

> Add the warn log when the volume available space isn't enough
> -
>
> Key: HDFS-14616
> URL: https://issues.apache.org/jira/browse/HDFS-14616
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.7.2
>Reporter: liying
>Assignee: liying
>Priority: Minor
> Attachments: HDFS-14616.001.patch, HDFS-14616.002.patch
>
>
> In the hadoop2 version, there is no warning log that the disk is not 
> available when using the disk. Therefore, the datanode log cannot be used to 
> check if the disk is not available ata certain time or for other problems.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1488) Scm cli command to start/stop replication manager

2019-08-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1488?focusedWorklogId=288452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288452
 ]

ASF GitHub Bot logged work on HDDS-1488:


Author: ASF GitHub Bot
Created on: 03/Aug/19 16:07
Start Date: 03/Aug/19 16:07
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1221: HDDS-1488. Scm 
cli command to start/stop replication manager.
URL: https://github.com/apache/hadoop/pull/1221#issuecomment-517935733
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 145 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 50 | Maven dependency ordering for branch |
   | +1 | mvninstall | 801 | trunk passed |
   | +1 | compile | 427 | trunk passed |
   | +1 | checkstyle | 84 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 1044 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 204 | trunk passed |
   | 0 | spotbugs | 528 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 754 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 26 | Maven dependency ordering for patch |
   | +1 | mvninstall | 601 | the patch passed |
   | +1 | compile | 434 | the patch passed |
   | +1 | cc | 434 | the patch passed |
   | +1 | javac | 434 | the patch passed |
   | -0 | checkstyle | 39 | hadoop-hdds: The patch generated 7 new + 0 
unchanged - 0 fixed = 7 total (was 0) |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 725 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 165 | the patch passed |
   | -1 | findbugs | 211 | hadoop-hdds generated 1 new + 0 unchanged - 0 fixed 
= 1 total (was 0) |
   ||| _ Other Tests _ |
   | +1 | unit | 325 | hadoop-hdds in the patch passed. |
   | -1 | unit | 2440 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 46 | The patch does not generate ASF License warnings. |
   | | | 9191 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | FindBugs | module:hadoop-hdds |
   |  |  Inconsistent synchronization of 
org.apache.hadoop.hdds.scm.container.ReplicationManager.replicationMonitor; 
locked 57% of time  Unsynchronized access at ReplicationManager.java:57% of 
time  Unsynchronized access at ReplicationManager.java:[line 169] |
   | Failed junit tests | 
hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.ozone.om.TestSecureOzoneManager |
   |   | hadoop.hdds.scm.pipeline.TestPipelineClose |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.ozone.om.TestScmSafeMode |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=18.09.7 Server=18.09.7 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1221/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1221 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle cc |
   | uname | Linux b631570a5548 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 8f40856 |
   | Default Java | 1.8.0_212 |
   | checkstyle | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1221/1/artifact/out/diff-checkstyle-hadoop-hdds.txt
 |
   | findbugs | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1221/1/artifact/out/new-findbugs-hadoop-hdds.html
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1221/1/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1221/1/testReport/ |
   | Max. process+thread count | 5243 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common hadoop-hdds/client hadoop-hdds/server-scm 
hadoop-hdds/tools U: hadoop-hdds |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1221/1/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   

[jira] [Updated] (HDDS-1488) Scm cli command to start/stop replication manager

2019-08-03 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1488:
--
Status: Patch Available  (was: Open)

> Scm cli command to start/stop replication manager
> -
>
> Key: HDDS-1488
> URL: https://issues.apache.org/jira/browse/HDDS-1488
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It would be nice to have scmcli command to start/stop the ReplicationManager 
> thread running in SCM



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1488) Scm cli command to start/stop replication manager

2019-08-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1488?focusedWorklogId=288443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288443
 ]

ASF GitHub Bot logged work on HDDS-1488:


Author: ASF GitHub Bot
Created on: 03/Aug/19 13:33
Start Date: 03/Aug/19 13:33
Worklog Time Spent: 10m 
  Work Description: nandakumar131 commented on pull request #1221: 
HDDS-1488. Scm cli command to start/stop replication manager.
URL: https://github.com/apache/hadoop/pull/1221
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 288443)
Time Spent: 10m
Remaining Estimate: 0h

> Scm cli command to start/stop replication manager
> -
>
> Key: HDDS-1488
> URL: https://issues.apache.org/jira/browse/HDDS-1488
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It would be nice to have scmcli command to start/stop the ReplicationManager 
> thread running in SCM



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1488) Scm cli command to start/stop replication manager

2019-08-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-1488:
-
Labels: pull-request-available  (was: )

> Scm cli command to start/stop replication manager
> -
>
> Key: HDDS-1488
> URL: https://issues.apache.org/jira/browse/HDDS-1488
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Blocker
>  Labels: pull-request-available
>
> It would be nice to have scmcli command to start/stop the ReplicationManager 
> thread running in SCM



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14264) Datanode du -sk command is slow

2019-08-03 Thread Amithsha (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899452#comment-16899452
 ] 

Amithsha commented on HDFS-14264:
-

Yes this is due to no of disks and the usage, in our environment of 4.1Tb * 13 
hard disks per node taking around 25min to report the FBR to NN. Since the yarn 
also cohosted on the same node, at regular interval datanode utilising high 
disk I/O and causing interupts to yarn applications.

> Datanode du -sk command is slow
> ---
>
> Key: HDFS-14264
> URL: https://issues.apache.org/jira/browse/HDFS-14264
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Amithsha
>Priority: Major
>
> Datanode consuming more time on du -sk command as well as creating heavy IO 
> on disk. In our prod systems, each disk of dfs usage is 3Tb, to caculate it, 
> the datanode will spend 10-20min Avg time. Also nodemanagers are running on 
> the same box during this du -sk operation could see heavy IO on disk.
> Datanode should cache the usage and also not to be cleared by any other 
> process.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14195) OIV: print out storage policy id in oiv Delimited output

2019-08-03 Thread Wang, Xinglong (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899421#comment-16899421
 ] 

Wang, Xinglong commented on HDFS-14195:
---

[~jojochuang] [~adam.antal] The failed tests are not related. Could you help to 
merge the patch?

> OIV: print out storage policy id in oiv Delimited output
> 
>
> Key: HDFS-14195
> URL: https://issues.apache.org/jira/browse/HDFS-14195
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Reporter: Wang, Xinglong
>Assignee: Wang, Xinglong
>Priority: Minor
> Attachments: HDFS-14195.001.patch, HDFS-14195.002.patch, 
> HDFS-14195.003.patch, HDFS-14195.004.patch, HDFS-14195.005.patch, 
> HDFS-14195.006.patch, HDFS-14195.007.patch, HDFS-14195.008.patch
>
>
> There is lacking of a method to get all folders and files with sort of 
> specified storage policy via command line, like ALL_SSD type.
> By adding storage policy id to oiv output, it will help with oiv 
> post-analysis to have a overview of all folders/files with specified storage 
> policy and to apply internal regulation based on this information.
>  
> Currently, for PBImageXmlWriter.java, in HDFS-9835 it added function to print 
> out xattr which including storage policy already.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14666) When using nfs3, the renaming fails when the target file exists.

2019-08-03 Thread fengchuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899413#comment-16899413
 ] 

fengchuang commented on HDFS-14666:
---

[~jojochuang] Yes, that is not what I expected. the tip is  overwrite?,I input 
y, but the results do not overwrite really,and the output is io error.The 
pictures of attachment can be seen。Thanks

> When using nfs3, the renaming fails when the target file exists.
> 
>
> Key: HDFS-14666
> URL: https://issues.apache.org/jira/browse/HDFS-14666
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: fengchuang
>Assignee: fengchuang
>Priority: Major
> Attachments: 1563945191461.jpg, 1563945214054.jpg, 
> HDFS-14666.001.patch
>
>
> mount -t nfs -o vers=3,proto=tcp,nolock  127.0.0.1:/  /home/test/nfs3test/
> cd  /home/test/nfs3test/
> echo "1">1.txt
> echo "2">2.txt
> mv 1.txt 2.txt
> tip overwite?y
> but fail.
> log:
>  
> org.apache.hadoop.fs.FileAlreadyExistsException: rename destination /2.txt 
> already exists
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.validateOverwrite(FSDirRenameOp.java:542)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:383)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:296)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:246)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:2924)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename2(NameNodeRpcServer.java:1052)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename2(ClientNamenodeProtocolServerSideTranslatorPB.java:657)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1731)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>  at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
>  at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1574)
>  at 
> org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.rename(RpcProgramNfs3.java:1400)
>  at 
> org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.rename(RpcProgramNfs3.java:1328)
>  at 
> org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:2259)
>  at org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:188)
>  at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>  at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>  at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>  at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)
>  at 
> org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:133)
>  at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>  at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>  at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>  at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>  at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
>  at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
>  at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
>  at 
> 

[jira] [Commented] (HDFS-13843) RBF: When we add/update mount entry to multiple destinations, unable to see the order information in mount entry points and in federation router UI

2019-08-03 Thread Ayush Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899401#comment-16899401
 ] 

Ayush Saxena commented on HDFS-13843:
-

[~zvenczel] any plans working on this??
If not I can takeover!!!

> RBF: When we add/update mount entry to multiple destinations, unable to see 
> the order information in mount entry points and in federation router UI
> ---
>
> Key: HDFS-13843
> URL: https://issues.apache.org/jira/browse/HDFS-13843
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-13843.01.patch, HDFS-13843.02.patch
>
>
> *Scenario:*
> Execute the below add/update command for single mount entry for single 
> nameservice pointing to multiple destinations. 
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3
>  # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order 
> RANDOM
> *Actual*. With the above commands, mount entry is successfully updated.
> But order information like HASH, RANDOM is not displayed in mount entries and 
> also not displayed in federation router UI. However order information is 
> updated properly when there are multiple nameservices. This issue is with 
> single nameservice having multiple destinations.
> *Expected:* 
> *Order information should be updated in mount entries so that the user will 
> come to know which order has been set.*
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13759) [HDFS Pagination]Does HDFS Java api Support Pagination?

2019-08-03 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899398#comment-16899398
 ] 

Wei-Chiu Chuang commented on HDFS-13759:


You should use DistributedFileSystem instead of DFSClient. The latter is a 
private API.
DistributedFileSystem offers DirListingIterator for the purpose you described.

> [HDFS Pagination]Does HDFS Java api Support Pagination?
> ---
>
> Key: HDFS-13759
> URL: https://issues.apache.org/jira/browse/HDFS-13759
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: fs, fs async
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: wuchang
>Priority: Major
>  Labels: HDFS, pagination
>
> I could use *FileSystem*
> {code:java}
> RemoteIterator listed = fs.listStatusIterator(new 
> Path("hdfs://warehousestore/user/chang.wu/flat_level_1"));{code}
> like this to get files *asynchronously*.
> But in fact what I want is a pagination support, where I could pass two 
> parameters, the 
> {code:java}
> offset{code}
> and
> {code:java}
> limit{code}
> , like MySQL does to get part of files under some directory;.
> I know I could just implement the pagination by wrapping the 
> *listStatusIterator*, but I think it looks weird.
> So, why can't HDFS support pagination for user directly? 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14679) failed to add erasure code policies with example template

2019-08-03 Thread Ayush Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899395#comment-16899395
 ] 

Ayush Saxena commented on HDFS-14679:
-

Thanx [~yuanzhou] for the patch.

{code:java}
+ The codec name is case insensitive -->
{code}

Shouldn't be it case sensitive? 

Apart LGTM


> failed to add erasure code policies with example template
> -
>
> Key: HDFS-14679
> URL: https://issues.apache.org/jira/browse/HDFS-14679
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.1.2
>Reporter: Yuan Zhou
>Assignee: Yuan Zhou
>Priority: Minor
> Attachments: HDFS-14679-01.patch, HDFS-14679-02.patch, 
> fix_adding_EC_policy_example.diff
>
>
> Hi Hadoop developers,
>  
> Trying to do some quick tests with erasure coding feature and ran into a 
> issue on adding policies. The example on adding erasure code policies with 
> provided template failed:
> {quote}./bin/hdfs ec -addPolicies -policyFile 
> /tmp/user_ec_policies.xml.template
>  2019-07-30 10:35:16,447 INFO util.ECPolicyLoader: Loading EC policy file 
> /tmp/user_ec_policies.xml.template
>  Add ErasureCodingPolicy XOR-2-1-128k succeed.
>  Add ErasureCodingPolicy RS-LEGACY-12-4-256k failed and error message is 
> Codec name RS-legacy is not supported
> {quote}
> The issue seems due to be the mismatching codec(upper case vs lower case). 
> The codec is in upper case in the example template[1] while all available 
> codecs are lower case[2]. A way to fix maybe just converting the codec to 
> lower case when parsing the policy schema. Also attached a simple patch here. 
> [1] 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/conf/user_ec_policies.xml.template#L51]
> [2][https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/ErasureCodeConstants.java#L28-L33]
> Thanks, -yuan



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14669) TestDirectoryScanner#testDirectoryScannerInFederatedCluster fails intermittently in trunk

2019-08-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899396#comment-16899396
 ] 

Hudson commented on HDFS-14669:
---

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17034 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17034/])
HDFS-14669. TestDirectoryScanner#testDirectoryScannerInFederatedCluster 
(ayushsaxena: rev 8f40856f762bf6b19c8162015491a71883eb1203)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java


> TestDirectoryScanner#testDirectoryScannerInFederatedCluster fails 
> intermittently in trunk
> -
>
> Key: HDFS-14669
> URL: https://issues.apache.org/jira/browse/HDFS-14669
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.2.0
> Environment: env free
>Reporter: qiang Liu
>Assignee: qiang Liu
>Priority: Minor
>  Labels: scanner, test
> Fix For: 3.3.0
>
> Attachments: HDFS-14669-trunk-001.patch, HDFS-14669-trunk.002.patch, 
> HDFS-14669-trunk.003.patch, HDFS-14669-trunk.004.patch
>
>
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner#testDirectoryScannerInFederatedCluster
>  radomlly Failes because of write files of the same name, meaning intent to 
> write 2 files but  2 files are the same name, witch cause a race condition of 
> datanode delete block and the scan action count block.
>  
> Ref :: 
> [https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1207/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testDirectoryScannerInFederatedCluster/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1903) Use dynamic ports for SCM in TestSCMClientProtocolServer and TestSCMSecurityProtocolServer

2019-08-03 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1903:
--
Labels: newbie  (was: )

> Use dynamic ports for SCM in TestSCMClientProtocolServer and 
> TestSCMSecurityProtocolServer
> --
>
> Key: HDDS-1903
> URL: https://issues.apache.org/jira/browse/HDDS-1903
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Priority: Major
>  Labels: newbie
>
> We should use dynamic port for SCM in the following test-cases
> * TestSCMClientProtocolServer
> * TestSCMSecurityProtocolServer



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1903) Use dynamic ports for SCM in TestSCMClientProtocolServer and TestSCMSecurityProtocolServer

2019-08-03 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1903:
-

 Summary: Use dynamic ports for SCM in TestSCMClientProtocolServer 
and TestSCMSecurityProtocolServer
 Key: HDDS-1903
 URL: https://issues.apache.org/jira/browse/HDDS-1903
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar


We should use dynamic port for SCM in the following test-cases
* TestSCMClientProtocolServer
* TestSCMSecurityProtocolServer



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14608) DataNode$DataTransfer should be named

2019-08-03 Thread Ayush Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899393#comment-16899393
 ] 

Ayush Saxena commented on HDFS-14608:
-

Thanx [~elgoiri] for the patch.
v001 LGTM +1

> DataNode$DataTransfer should be named
> -
>
> Key: HDFS-14608
> URL: https://issues.apache.org/jira/browse/HDFS-14608
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-14608.000.patch, HDFS-14608.001.patch
>
>
> Currently, the {{DataTransfer}} thread has no name and it just outputs the 
> default {{toString()}}.
> This shows in the logs in jstack as something like:
> {code}
> 2019-06-25 11:01:01,211 INFO 
> [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@609ed67a] 
> org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at 
> CO4AEAPC1AF:10010: Transmitted 
> BP-1191059133-10.1.2.3-145702348:blk_1113379522_69745835 
> (numBytes=485214) to 10.1.2.3/10.1.2.3:10010
> {code}
> As this uses the {{Daemon}} class, the name is set based on:
> {code}
>   public Daemon(Runnable runnable) {
> super(runnable);
> this.runnable = runnable;
> this.setName(((Object)runnable).toString());
>   }
> {code}
> We should implement toString to at least have the name of the block being 
> transfferred or something similar to what DataXceiver does (e.g., HDFS-3375).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14669) TestDirectoryScanner#testDirectoryScannerInFederatedCluster fails intermittently in trunk

2019-08-03 Thread Ayush Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-14669:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

> TestDirectoryScanner#testDirectoryScannerInFederatedCluster fails 
> intermittently in trunk
> -
>
> Key: HDFS-14669
> URL: https://issues.apache.org/jira/browse/HDFS-14669
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.2.0
> Environment: env free
>Reporter: qiang Liu
>Assignee: qiang Liu
>Priority: Minor
>  Labels: scanner, test
> Fix For: 3.3.0
>
> Attachments: HDFS-14669-trunk-001.patch, HDFS-14669-trunk.002.patch, 
> HDFS-14669-trunk.003.patch, HDFS-14669-trunk.004.patch
>
>
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner#testDirectoryScannerInFederatedCluster
>  radomlly Failes because of write files of the same name, meaning intent to 
> write 2 files but  2 files are the same name, witch cause a race condition of 
> datanode delete block and the scan action count block.
>  
> Ref :: 
> [https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1207/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testDirectoryScannerInFederatedCluster/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14669) TestDirectoryScanner#testDirectoryScannerInFederatedCluster fails intermittently in trunk

2019-08-03 Thread Ayush Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899391#comment-16899391
 ] 

Ayush Saxena commented on HDFS-14669:
-

Committed to trunk.
Thanx [~iamgd67] for the contribution!!!

> TestDirectoryScanner#testDirectoryScannerInFederatedCluster fails 
> intermittently in trunk
> -
>
> Key: HDFS-14669
> URL: https://issues.apache.org/jira/browse/HDFS-14669
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.2.0
> Environment: env free
>Reporter: qiang Liu
>Assignee: qiang Liu
>Priority: Minor
>  Labels: scanner, test
> Attachments: HDFS-14669-trunk-001.patch, HDFS-14669-trunk.002.patch, 
> HDFS-14669-trunk.003.patch, HDFS-14669-trunk.004.patch
>
>
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner#testDirectoryScannerInFederatedCluster
>  radomlly Failes because of write files of the same name, meaning intent to 
> write 2 files but  2 files are the same name, witch cause a race condition of 
> datanode delete block and the scan action count block.
>  
> Ref :: 
> [https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1207/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testDirectoryScannerInFederatedCluster/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1902) Fix checkstyle issues in ContainerStateMachine

2019-08-03 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1902:
-

 Summary: Fix checkstyle issues in ContainerStateMachine
 Key: HDDS-1902
 URL: https://issues.apache.org/jira/browse/HDDS-1902
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Nanda kumar
Assignee: Nanda kumar


Fix checkstyle issues in ContainerStateMachine:
Line is longer than 80 characters (found 85).




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14208) A large number missingblocks happend after failover to active.

2019-08-03 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899388#comment-16899388
 ] 

Wei-Chiu Chuang commented on HDFS-14208:


HDFS-10399?

> A large number missingblocks happend after failover to active.
> --
>
> Key: HDFS-14208
> URL: https://issues.apache.org/jira/browse/HDFS-14208
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.2, 3.1.1, 3.0.3
>Reporter: xuzq
>Priority: Major
>
> When cluster is very large, Standby startup takes about 1 hours. After 
> Standby starts and exits SafeMode, a large number of missingblocks often 
> appeared after failover from standby to active. And the missingblocks will  
> gradually disappear after 6 hours(block report interval).
> According the log and find only half the blocks of one DataNode be processed 
> in SaftMode.
>  
> Because the lease expired after processed some blocks of one DataNode.
> [MergeRequest|https://github.com/apache/hadoop/pull/467]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14264) Datanode du -sk command is slow

2019-08-03 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899386#comment-16899386
 ] 

Wei-Chiu Chuang commented on HDFS-14264:


Use the DF implementation added in HADOOP-12974 should help as well.
I am aware certain users having this problem, but not all. I wonder what makes 
the difference: disk configuration (many vs a few disks), local file system, or 
operating system version?

> Datanode du -sk command is slow
> ---
>
> Key: HDFS-14264
> URL: https://issues.apache.org/jira/browse/HDFS-14264
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Amithsha
>Priority: Major
>
> Datanode consuming more time on du -sk command as well as creating heavy IO 
> on disk. In our prod systems, each disk of dfs usage is 3Tb, to caculate it, 
> the datanode will spend 10-20min Avg time. Also nodemanagers are running on 
> the same box during this du -sk operation could see heavy IO on disk.
> Datanode should cache the usage and also not to be cleared by any other 
> process.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14423) Percent (%) and plus (+) characters no longer work in WebHDFS

2019-08-03 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899384#comment-16899384
 ] 

Wei-Chiu Chuang commented on HDFS-14423:


Does HDFS-14323 fix it?

> Percent (%) and plus (+) characters no longer work in WebHDFS
> -
>
> Key: HDFS-14423
> URL: https://issues.apache.org/jira/browse/HDFS-14423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.2.0, 3.1.2
> Environment: Ubuntu 16.04, but I believe this is irrelevant.
>Reporter: Jing Wang
>Priority: Major
>
> The following commands with percent (%) no longer work starting with version 
> 3.1:
> {code:java}
> $ hadoop/bin/hdfs dfs -touchz webhdfs://localhost/%
> $ hadoop/bin/hdfs dfs -cat webhdfs://localhost/%
> cat: URLDecoder: Incomplete trailing escape (%) pattern
> {code}
> Also, plus (+ ) characters get turned into spaces when doing DN operations:
> {code:java}
> $ hadoop/bin/hdfs dfs -touchz webhdfs://localhost/a+b
> $ hadoop/bin/hdfs dfs -mkdir webhdfs://localhost/c+d
> $ hadoop/bin/hdfs dfs -ls /
> Found 4 items
> -rw-r--r--   1 jing supergroup  0 2019-04-12 11:20 /a b
> drwxr-xr-x   - jing supergroup  0 2019-04-12 11:21 /c+d
> {code}
> I can confirm that these commands work correctly on 2.9 and 3.0. Also, the 
> usual hdfs:// client works as expected.
> I suspect a relation with HDFS-13176 or HDFS-13582, but I'm not sure what the 
> right fix is. Note that Hive uses % to escape special characters in partition 
> values, so banning % might not be a good option. For example, Hive will 
> create a paths like {{table_name/partition_key=%2F}} when 
> {{partition_key='/'}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12484) Undefined -expunge behavior after 2.8

2019-08-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899381#comment-16899381
 ] 

Hadoop QA commented on HDFS-12484:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-12484 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12484 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12888234/HDFS-12484.002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27391/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Undefined -expunge behavior after 2.8
> -
>
> Key: HDFS-12484
> URL: https://issues.apache.org/jira/browse/HDFS-12484
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Priority: Major
> Attachments: HDFS-12484.001.patch, HDFS-12484.002.patch
>
>
> (Rewrote the description to reflect the actual behavior)
> Hadoop 2.8 added a feature to support trash inside encryption zones, which is 
> a great feature to have.
> However, when it comes to -expunge, the behavior is not well defined. A 
> superuser invoking -expunge removes files under all encryption zone trash 
> directory belonging to the user. On the other hand, because 
> listEncryptionZones requires superuser permission, a non-privileged user 
> invoking -expunge can removes under home directory, but not under encryption 
> zones.
> Moreover, the command prints a scary warning message that looks annoying.
> {noformat}
> 2017-09-21 01:22:44,744 [main] WARN  hdfs.DFSClient 
> (DistributedFileSystem.java:getTrashRoots(2795)) - Cannot get all encrypted 
> trash roots
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
>  Access denied for user user. Superuser privilege is required
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSuperuserPrivilege(FSPermissionChecker.java:130)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkSuperuserPrivilege(FSNamesystem.java:4556)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.listEncryptionZones(FSNamesystem.java:7048)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.listEncryptionZones(NameNodeRpcServer.java:2053)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.listEncryptionZones(ClientNamenodeProtocolServerSideTranslatorPB.java:1477)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1490)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1436)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1346)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy25.listEncryptionZones(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.listEncryptionZones(ClientNamenodeProtocolTranslatorPB.java:1510)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> 

[jira] [Commented] (HDFS-12826) Document Saying the RPC port, But it's required IPC port in HDFS Federation Document.

2019-08-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899380#comment-16899380
 ] 

Hudson commented on HDFS-12826:
---

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17033 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17033/])
HDFS-12826. Document Saying the RPC port, But it's required IPC port in 
(ayushsaxena: rev e503db5f449411eb6227c5b201d3dbfe2fa314bb)
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/Federation.md


> Document Saying the RPC port, But it's required IPC port in HDFS Federation 
> Document.
> -
>
> Key: HDFS-12826
> URL: https://issues.apache.org/jira/browse/HDFS-12826
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, documentation
>Affects Versions: 3.0.0-beta1
>Reporter: Harshakiran Reddy
>Assignee: usharani
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-12826.patch
>
>
> In {{Adding a new Namenode to an existing HDFS cluster}} , refreshNamenodes 
> command required IPC port but in Documentation it's saying the RPC port.
> http://hadoop.apache.org/docs/r3.0.0-beta1/hadoop-project-dist/hadoop-hdfs/Federation.html#Balancer
> {noformat} 
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes host-name:65110
> refreshNamenodes: Unknown protocol: 
> org.apache.hadoop.hdfs.protocol.ClientDatanodeProtocol
> bin.:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes
> Usage: hdfs dfsadmin [-refreshNamenodes datanode-host:ipc_port]
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes host-name:50077
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin>
> {noformat} 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1901) Fix Ozone HTTP WebConsole Authentication

2019-08-03 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDDS-1901:


 Summary: Fix Ozone HTTP WebConsole Authentication
 Key: HDDS-1901
 URL: https://issues.apache.org/jira/browse/HDDS-1901
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Vivek Ratnavel Subramanian
Assignee: Xiaoyu Yao


This was found during integration testing where the http authentication is 
enabled but anonymous can still access the ozone http web console like scm:9876 
or om:9874. This can be reproed with the following configurations added to the 
ozonesecure docker-compose.

{code}

CORE-SITE.XML_hadoop.http.authentication.simple.anonymous.allowed=false

CORE-SITE.XML_hadoop.http.authentication.signature.secret.file=/etc/security/http_secret

CORE-SITE.XML_hadoop.http.authentication.type=kerberos

CORE-SITE.XML_hadoop.http.authentication.kerberos.principal=HTTP/_h...@example.com

CORE-SITE.XML_hadoop.http.authentication.kerberos.keytab=/etc/security/keytabs/HTTP.keytab

CORE-SITE.XML_hadoop.http.filter.initializers=org.apache.hadoop.security.AuthenticationFilterInitializer

{code}

After debugging into the KerberosAuthenticationFilter, the root cause is the 
name of the keytab does not follow the AuthenticationFilter tradition. The fix 
is to change 

hdds.scm.http.kerberos.keytab.file to hdds.scm.http.kerberos.keytab and
hdds.om.http.kerberos.keytab.file to hdds.om.http.kerberos.keytab

I will also add an integration test for this under ozonesecure docker-compose. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9668) Optimize the locking in FsDatasetImpl

2019-08-03 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899379#comment-16899379
 ] 

Anoop Sam John commented on HDFS-9668:
--

[~zhangchen] .. No work is happening around this patch.  So can u detail ur 
usage?  What different types of block devices under usage in the HSM? 

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
>Priority: Major
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, 
> HDFS-9668-20.patch, HDFS-9668-21.patch, HDFS-9668-22.patch, 
> HDFS-9668-23.patch, HDFS-9668-23.patch, HDFS-9668-24.patch, 
> HDFS-9668-25.patch, HDFS-9668-26.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage 

[jira] [Commented] (HDFS-14605) Note missing on expunge command description for encrypted zones

2019-08-03 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899378#comment-16899378
 ] 

Wei-Chiu Chuang commented on HDFS-14605:


Please see my patch at HDFS-12484 and see if it makes sense. I'll resolve this 
one as a dup.

> Note missing on expunge command description for encrypted zones
> ---
>
> Key: HDFS-14605
> URL: https://issues.apache.org/jira/browse/HDFS-14605
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.7.5, 3.0.0, 3.1.0
>Reporter: Srinivasu Majeti
>Priority: Minor
>  Labels: documentation-update
> Fix For: 2.7.3, 2.7.5, 3.0.0, 3.1.0
>
>
> expunge command is supported for both encrypted and non-encrypted hdfs paths 
> . This operation initially needs to discover/list all such paths. 
> Listing/Discovering encrypted zone paths is only supported by superuser and 
> expunge command misleads us by printing below message though its a warning . 
> We could add some message in the expunge command description saying that the 
> command supports encrypted zone paths only when run as superuser and it will 
> continue listing and performing the operation for all non encrypted hdfs 
> paths.
> 19/06/25 08:30:13 WARN hdfs.DFSClient: Cannot get all encrypted trash roots
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
>  Access denied for user ambari-qa. Superuser privilege is required
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSuperuserPrivilege(FSPermissionChecker.java:130)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14605) Note missing on expunge command description for encrypted zones

2019-08-03 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-14605.

Resolution: Duplicate

> Note missing on expunge command description for encrypted zones
> ---
>
> Key: HDFS-14605
> URL: https://issues.apache.org/jira/browse/HDFS-14605
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.7.5, 3.0.0, 3.1.0
>Reporter: Srinivasu Majeti
>Priority: Minor
>  Labels: documentation-update
> Fix For: 3.1.0, 3.0.0, 2.7.5, 2.7.3
>
>
> expunge command is supported for both encrypted and non-encrypted hdfs paths 
> . This operation initially needs to discover/list all such paths. 
> Listing/Discovering encrypted zone paths is only supported by superuser and 
> expunge command misleads us by printing below message though its a warning . 
> We could add some message in the expunge command description saying that the 
> command supports encrypted zone paths only when run as superuser and it will 
> continue listing and performing the operation for all non encrypted hdfs 
> paths.
> 19/06/25 08:30:13 WARN hdfs.DFSClient: Cannot get all encrypted trash roots
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
>  Access denied for user ambari-qa. Superuser privilege is required
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSuperuserPrivilege(FSPermissionChecker.java:130)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12826) Document Saying the RPC port, But it's required IPC port in HDFS Federation Document.

2019-08-03 Thread Ayush Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899376#comment-16899376
 ] 

Ayush Saxena commented on HDFS-12826:
-

Committed to trunk.
Thanx [~peruguusha] for the contribution and [~Harsha1206] for the report!!!

> Document Saying the RPC port, But it's required IPC port in HDFS Federation 
> Document.
> -
>
> Key: HDFS-12826
> URL: https://issues.apache.org/jira/browse/HDFS-12826
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, documentation
>Affects Versions: 3.0.0-beta1
>Reporter: Harshakiran Reddy
>Assignee: usharani
>Priority: Minor
> Attachments: HDFS-12826.patch
>
>
> In {{Adding a new Namenode to an existing HDFS cluster}} , refreshNamenodes 
> command required IPC port but in Documentation it's saying the RPC port.
> http://hadoop.apache.org/docs/r3.0.0-beta1/hadoop-project-dist/hadoop-hdfs/Federation.html#Balancer
> {noformat} 
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes host-name:65110
> refreshNamenodes: Unknown protocol: 
> org.apache.hadoop.hdfs.protocol.ClientDatanodeProtocol
> bin.:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes
> Usage: hdfs dfsadmin [-refreshNamenodes datanode-host:ipc_port]
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes host-name:50077
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin>
> {noformat} 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12826) Document Saying the RPC port, But it's required IPC port in HDFS Federation Document.

2019-08-03 Thread Ayush Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-12826:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

> Document Saying the RPC port, But it's required IPC port in HDFS Federation 
> Document.
> -
>
> Key: HDFS-12826
> URL: https://issues.apache.org/jira/browse/HDFS-12826
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, documentation
>Affects Versions: 3.0.0-beta1
>Reporter: Harshakiran Reddy
>Assignee: usharani
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-12826.patch
>
>
> In {{Adding a new Namenode to an existing HDFS cluster}} , refreshNamenodes 
> command required IPC port but in Documentation it's saying the RPC port.
> http://hadoop.apache.org/docs/r3.0.0-beta1/hadoop-project-dist/hadoop-hdfs/Federation.html#Balancer
> {noformat} 
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes host-name:65110
> refreshNamenodes: Unknown protocol: 
> org.apache.hadoop.hdfs.protocol.ClientDatanodeProtocol
> bin.:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes
> Usage: hdfs dfsadmin [-refreshNamenodes datanode-host:ipc_port]
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes host-name:50077
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin>
> {noformat} 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14627) Improvements to make slow archive storage works on HDFS

2019-08-03 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899375#comment-16899375
 ] 

Wei-Chiu Chuang commented on HDFS-14627:


[~hadoop_yangyun] thanks for filing the jira and make this proposal.
Would you like to contribute a patch?

> Improvements to make slow archive storage works on HDFS
> ---
>
> Key: HDFS-14627
> URL: https://issues.apache.org/jira/browse/HDFS-14627
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yang Yun
>Priority: Minor
> Attachments: data_flow_between_datanode_and_aws_s3.jpg
>
>
> In our setup, we mount archival storage from remote. the write speed is about 
> 20M/Sec, the read speed is about 40M/Sec, and the normal file operations, for 
> example 'ls', are time consuming.
> we add some improvements to make this kind of archive storage works in 
> currrent hdfs system.
> 1. Add multiply to read/write timeout if block saved on archive storage.
> 2. Save replica cache file of archive storage to other fast disk for quick 
> restart datanode, shutdownHook may does not execute if the saving takes too 
> long time.
> 3. Check mount file system before using mounted archive storage.
> 4. Reduce or avoid call DF during generating heartbeat report for archive 
> storage.
> 5. Add option to skip archive block during decommission.
> 6. Use multi-threads to scan archive storage.
> 7. Check archive storage error with retry times.
> 8. Add option to disable scan block on archive storage.
> 9. Sleep a heartBeat time if there are too many difference when call 
> checkAndUpdate in DirectoryScanner
> 10. An auto-service to scan fsimage and set the storage policy of files 
> according to policy.
> 11. An auto-service to call mover to move the blocks to right storage.
> 12. Dedup files on remote storage if the storage is reliable.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12826) Document Saying the RPC port, But it's required IPC port in HDFS Federation Document.

2019-08-03 Thread Ayush Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-12826:

Summary: Document Saying the RPC port, But it's required IPC port in HDFS 
Federation Document.  (was: Document Saying the RPC port, But it's required IPC 
port in Balancer Document.)

> Document Saying the RPC port, But it's required IPC port in HDFS Federation 
> Document.
> -
>
> Key: HDFS-12826
> URL: https://issues.apache.org/jira/browse/HDFS-12826
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, documentation
>Affects Versions: 3.0.0-beta1
>Reporter: Harshakiran Reddy
>Assignee: usharani
>Priority: Minor
> Attachments: HDFS-12826.patch
>
>
> In {{Adding a new Namenode to an existing HDFS cluster}} , refreshNamenodes 
> command required IPC port but in Documentation it's saying the RPC port.
> http://hadoop.apache.org/docs/r3.0.0-beta1/hadoop-project-dist/hadoop-hdfs/Federation.html#Balancer
> {noformat} 
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes host-name:65110
> refreshNamenodes: Unknown protocol: 
> org.apache.hadoop.hdfs.protocol.ClientDatanodeProtocol
> bin.:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes
> Usage: hdfs dfsadmin [-refreshNamenodes datanode-host:ipc_port]
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes host-name:50077
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin>
> {noformat} 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14617) Improve fsimage load time by writing sub-sections to the fsimage index

2019-08-03 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899369#comment-16899369
 ] 

Wei-Chiu Chuang commented on HDFS-14617:


[~hexiaoqiao] thanks for your input! It looks like your SerialLoading.svg and 
ParallelLoading.svg are the same. Do you still have those charts by any chance?

> Improve fsimage load time by writing sub-sections to the fsimage index
> --
>
> Key: HDFS-14617
> URL: https://issues.apache.org/jira/browse/HDFS-14617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-14617.001.patch, ParallelLoading.svg, 
> SerialLoading.svg, dirs-single.svg, inodes.svg
>
>
> Loading an fsimage is basically a single threaded process. The current 
> fsimage is written out in sections, eg iNode, iNode_Directory, Snapshots, 
> Snapshot_Diff etc. Then at the end of the file, an index is written that 
> contains the offset and length of each section. The image loader code uses 
> this index to initialize an input stream to read and process each section. It 
> is important that one section is fully loaded before another is started, as 
> the next section depends on the results of the previous one.
> What I would like to propose is the following:
> 1. When writing the image, we can optionally output sub_sections to the 
> index. That way, a given section would effectively be split into several 
> sections, eg:
> {code:java}
>inode_section offset 10 length 1000
>  inode_sub_section offset 10 length 500
>  inode_sub_section offset 510 length 500
>  
>inode_dir_section offset 1010 length 1000
>  inode_dir_sub_section offset 1010 length 500
>  inode_dir_sub_section offset 1010 length 500
> {code}
> Here you can see we still have the original section index, but then we also 
> have sub-section entries that cover the entire section. Then a processor can 
> either read the full section in serial, or read each sub-section in parallel.
> 2. In the Image Writer code, we should set a target number of sub-sections, 
> and then based on the total inodes in memory, it will create that many 
> sub-sections per major image section. I think the only sections worth doing 
> this for are inode, inode_reference, inode_dir and snapshot_diff. All others 
> tend to be fairly small in practice.
> 3. If there are under some threshold of inodes (eg 10M) then don't bother 
> with the sub-sections as a serial load only takes a few seconds at that scale.
> 4. The image loading code can then have a switch to enable 'parallel loading' 
> and a 'number of threads' where it uses the sub-sections, or if not enabled 
> falls back to the existing logic to read the entire section in serial.
> Working with a large image of 316M inodes and 35GB on disk, I have a proof of 
> concept of this change working, allowing just inode and inode_dir to be 
> loaded in parallel, but I believe inode_reference and snapshot_diff can be 
> make parallel with the same technique.
> Some benchmarks I have are as follows:
> {code:java}
> Threads   1 2 3 4 
> 
> inodes448   290   226   189 
> inode_dir 326   211   170   161 
> Total 927   651   535   488 (MD5 calculation about 100 seconds)
> {code}
> The above table shows the time in seconds to load the inode section and the 
> inode_directory section, and then the total load time of the image.
> With 4 threads using the above technique, we are able to better than half the 
> load time of the two sections. With the patch in HDFS-13694 it would take a 
> further 100 seconds off the run time, going from 927 seconds to 388, which is 
> a significant improvement. Adding more threads beyond 4 has diminishing 
> returns as there are some synchronized points in the loading code to protect 
> the in memory structures.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org