[jira] [Comment Edited] (HDDS-245) Handle ContainerReports in the SCM

2018-07-26 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559283#comment-16559283
 ] 

Lokesh Jain edited comment on HDDS-245 at 7/27/18 5:41 AM:
---

Thanks [~elek] for working on this! The patch looks very good to me. I have a 
few minor comments.
 # ReportResult:58,59 - we can keep the missingContainers and newContainers as 
null.
 # ContainerMapping#getContainerWithPipeline needs to be updated for closed 
container case. For closed containers we need to fetch the datanodes from 
ContainerStateMap and return the appropriate pipeline information.
 # START_REPLICATION is currently not fired by any publisher. I guess it will 
be part of another jira?
 # We are currently processing the report as soon it is received. Are we 
handling the case when a container is added in one DN and has been removed from 
another DN? In such a case we might be sending out a false replicate event as 
replication count would still match the replication factor.


was (Author: ljain):
Thanks [~elek] for working on this! I have a few minor comments.
 # ReportResult:58,59 - we can keep the missingContainers and newContainers as 
null.
 # ContainerMapping#getContainerWithPipeline needs to be updated for closed 
container case. For closed containers we need to fetch the datanodes from 
ContainerStateMap and return the appropriate pipeline information.
 # START_REPLICATION is currently not fired by any publisher. I guess it will 
be part of another jira?
 # We are currently processing the report as soon it is received. Are we 
handling the case when a container is added in one DN and has been removed from 
another DN? In such a case we might be sending out a false replicate event as 
replication count would still match the replication factor.

> Handle ContainerReports in the SCM
> --
>
> Key: HDDS-245
> URL: https://issues.apache.org/jira/browse/HDDS-245
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-245.001.patch, HDDS-245.002.patch, 
> HDDS-245.003.patch
>
>
> HDDS-242 provides a new class ContainerReportHandler which could handle the 
> ContainerReports from the SCMHeartbeatDispatchere.
> HDDS-228 introduces a new map to store the container -> datanode[] mapping
> HDDS-199 implements the ReplicationManager which could send commands to the 
> datanodes to copy the datanode.
> To wire all these components, we need to put implementation to the 
> ContainerReportHandler (created in HDDS-242).
> The ContainerReportHandler should process the new ContainerReportForDatanode 
> events, update the containerStateMap and node2ContainerMap and calculate the 
> missing/duplicate containers and send the ReplicateCommand to the 
> ReplicateManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-245) Handle ContainerReports in the SCM

2018-07-26 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559283#comment-16559283
 ] 

Lokesh Jain commented on HDDS-245:
--

Thanks [~elek] for working on this! I have a few minor comments.
 # ReportResult:58,59 - we can keep the missingContainers and newContainers as 
null.
 # ContainerMapping#getContainerWithPipeline needs to be updated for closed 
container case. For closed containers we need to fetch the datanodes from 
ContainerStateMap and return the appropriate pipeline information.
 # START_REPLICATION is currently not fired by any publisher. I guess it will 
be part of another jira?
 # We are currently processing the report as soon it is received. Are we 
handling the case when a container is added in one DN and has been removed from 
another DN? In such a case we might be sending out a false replicate event as 
replication count would still match the replication factor.

> Handle ContainerReports in the SCM
> --
>
> Key: HDDS-245
> URL: https://issues.apache.org/jira/browse/HDDS-245
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-245.001.patch, HDDS-245.002.patch, 
> HDDS-245.003.patch
>
>
> HDDS-242 provides a new class ContainerReportHandler which could handle the 
> ContainerReports from the SCMHeartbeatDispatchere.
> HDDS-228 introduces a new map to store the container -> datanode[] mapping
> HDDS-199 implements the ReplicationManager which could send commands to the 
> datanodes to copy the datanode.
> To wire all these components, we need to put implementation to the 
> ContainerReportHandler (created in HDDS-242).
> The ContainerReportHandler should process the new ContainerReportForDatanode 
> events, update the containerStateMap and node2ContainerMap and calculate the 
> missing/duplicate containers and send the ReplicateCommand to the 
> ReplicateManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-296) OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan

2018-07-26 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559247#comment-16559247
 ] 

Xiaoyu Yao commented on HDDS-296:
-

[~anu], I notice that we don't have basic bloom filter and prefix_extractor  
enabled on OM metadata store. With that, I believe the range scan performance 
will be much better than we have now. There are many other tuning knobs for 
rocksdb for us explore. 

> OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan
> -
>
> Key: HDDS-296
> URL: https://issues.apache.org/jira/browse/HDDS-296
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Anu Engineer
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: local.png
>
>
> We identified the problem during freon tests on real clusters. First I saw it 
> on a kubernetes based pseudo cluster (50 datanode, 1 freon). After a while 
> the rate of the key allocation was slowed down. (See the attached image).
> I could also reproduce the problem with local cluster (I used the 
> hadoop-dist/target/compose/ozoneperf setup). After the first 1 million keys 
> the key creation is almost stopped.
> With the help of [~nandakumar131] we identified the problem is the lock in 
> the ozone manager. (We profiled the OM with visual vm and found that the code 
> is locked for an extremity long time, also checked the rocksdb/rpc metrics 
> from prometheus and everything else was worked well.
> [~nandakumar131] suggested to use Instrumented lock in the OMMetadataManager. 
> With a custom build we identified that the problem is that the deletion 
> service holds the OMMetadataManager lock for a full range scan. For 1 million 
> keys it took about 10 seconds (with my local developer machine + ssd)
> {code}
> ozoneManager_1  | 2018-07-25 12:45:03 WARN  OMMetadataManager:143 - Lock held 
> time above threshold: lock identifier: OMMetadataManagerLock 
> lockHeldTimeMs=2648 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> ozoneManager_1  | 
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedReadLock.unlock(InstrumentedReadLock.java:78)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyManagerImpl.getPendingDeletionKeys(KeyManagerImpl.java:506)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:98)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:85)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> ozoneManager_1  | 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> ozoneManager_1  | 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ozoneManager_1  | 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ozoneManager_1  | java.lang.Thread.run(Thread.java:748)
> {code}
> I checked it with disabled DeletionService and worked well.
> Deletion service should be improved to make it work without long term locking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559236#comment-16559236
 ] 

genericqa commented on HDFS-13769:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  4m  
6s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 1s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m  
4s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 28m  4s{color} 
| {color:red} root generated 4 new + 1468 unchanged - 0 fixed = 1472 total (was 
1468) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 34s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
30s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}124m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-13769 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933177/HDFS-13769.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 930a95974683 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8d3c068 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| javac | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24664/artifact/out/diff-compile-javac-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24664/testReport/ |
| Max. process+thread count | 1428 (vs. ulimit of 1) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24664/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Namenode 

[jira] [Updated] (HDDS-283) Need an option to list all volumes created in the cluster

2018-07-26 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-283:

Attachment: HDDS-283.001.patch

> Need an option to list all volumes created in the cluster
> -
>
> Key: HDDS-283
> URL: https://issues.apache.org/jira/browse/HDDS-283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-283.001.patch
>
>
> Currently , listVolume command either gives :
> 1) all the volumes created by a particular user , using -user argument.
> 2) or , all the volumes created by the logged in user , if no -user argument 
> is provided.
>  
> We need an option to list all the volumes created in the cluster



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-283) Need an option to list all volumes created in the cluster

2018-07-26 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-283:

Status: Patch Available  (was: Open)

> Need an option to list all volumes created in the cluster
> -
>
> Key: HDDS-283
> URL: https://issues.apache.org/jira/browse/HDDS-283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-283.001.patch
>
>
> Currently , listVolume command either gives :
> 1) all the volumes created by a particular user , using -user argument.
> 2) or , all the volumes created by the logged in user , if no -user argument 
> is provided.
>  
> We need an option to list all the volumes created in the cluster



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13658) fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 1 replica

2018-07-26 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559183#comment-16559183
 ] 

Xiao Chen commented on HDFS-13658:
--

Thanks for the continued work on this Kitti! I think this is pretty close, some 
comments:
- For EC blocks, low redundancy is not having 0 or 1 replicas. I guess we could 
borrow from the comment of {{LowRedundancyBlocks#getPriorityStriped}}, to call 
it 'at highest risk of loss'.
- {{ECBlockGroupStats}} and {{ReplicatedBlockStats}} are public, so we cannot 
change the constructors. Can either add an overload ctor, or add use builder 
pattern if you want.
- I think don't feel strongly about the new fsck option. But for cleanness I 
propose we do the metrics work here, and split that out to another jira. My 
take is that with the new stats admins can get the information they want, and 
the fsck flag seems to add limited value.

> fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 
> 1 replica
> ---
>
> Key: HDFS-13658
> URL: https://issues.apache.org/jira/browse/HDFS-13658
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13658.001.patch, HDFS-13658.002.patch, 
> HDFS-13658.003.patch, HDFS-13658.004.patch, HDFS-13658.005.patch, 
> HDFS-13658.006.patch, HDFS-13658.007.patch, HDFS-13658.008.patch
>
>
> fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 
> 1 replica. We have had many cases opened in which a customer has lost a disk 
> or a DN losing files/blocks due to the fact that they had blocks with only 1 
> replica. We need to make the customer better aware of this situation and that 
> they should take action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-226) Client should update block length in OM while committing the key

2018-07-26 Thread Mukul Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559179#comment-16559179
 ] 

Mukul Kumar Singh edited comment on HDDS-226 at 7/27/18 3:01 AM:
-

Thanks for working on this  [~shashikant]. Please find me comments as following

1) OmKeyInfo#updateBlockLength, there are 2 for loops. Normally, the order of 
the blocks in the ksmKeyLocations and in the blockIDList will be the same, so I 
feel this can be optimized by walking the list only once. Also once we have 
found a match, we should break from the first loop.

2) Also we have a DatanodeBlockID in DatanodeContainerProto, the block length 
is not an argument there. Should this be updated as well ?


was (Author: msingh):
Thanks for working on this  [~shashikant]. Apart from Ni

1) OmKeyInfo#updateBlockLength, there are 2 for loops. Normally, the order of 
the blocks in the ksmKeyLocations and in the blockIDList will be the same, so I 
feel this can be optimized by walking the list only once. Also once we have 
found a match, we should break from the first loop.

2) Also we have a DatanodeBlockID in DatanodeContainerProto, the block length 
is not an argument there. Should this be updated as well ?

> Client should update block length in OM while committing the key
> 
>
> Key: HDDS-226
> URL: https://issues.apache.org/jira/browse/HDDS-226
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-226.00.patch, HDDS-226.01.patch, HDDS-226.02.patch, 
> HDDS-226.03.patch, HDDS-226.04.patch, HDDS-226.05.patch
>
>
> Currently the client allocate a key of size with SCM block size, however a 
> client can always write smaller amount of data and close the key. The block 
> length in this case should be updated on OM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559171#comment-16559171
 ] 

Yiqun Lin edited comment on HDFS-13769 at 7/27/18 3:01 AM:
---

{quote}
Also clear checkpoint in trash is a typical situation of deleting a large dir, 
since the checkpoint dir of trash accumulates deleted files within several 
hours.
{quote}
Agree. We also met this problem. There is a big chance the checkpoint dir being 
a large dir. As [~kihwal] mentioned, for the safe deleting, it might not be a 
atomic operation. But it should be okay using for clearing trash dir.

{quote}
Wei-Chiu Chuang, Agree! getContentSummary is a recursive method and it may take 
several seconds if the dir is very large. getContentSummary holds the read-lock 
in FSNameSystem rather than the write-lock. Also we need a way to know whether 
a dir is large. If there is a better solution I don't know, please tell me, and 
I think it need not to be very accurate.
{quote}
I am thinking for this, we can skip invoking expensive call 
{{getContentSummary}} for the first level dir since there will be a big chance 
as a large dir. For the deeper children paths, we can do as current patch did. 
This might be a better way I think.


was (Author: linyiqun):
{quote}
Also clear checkpoint in trash is a typical situation of deleting a large dir, 
since the checkpoint dir of trash accumulates deleted files within several 
hours.
{quote}
Agree. We also met this problem. There is a big chance the checkpoint dir being 
a large dir. As [~kihwal] mentioned, for the safe deleting, it might not be a 
atomic operation. But it should be okay using for clearing trash dir.

{quote}
Wei-Chiu Chuang, Agree! getContentSummary is a recursive method and it may take 
several seconds if the dir is very large. getContentSummary holds the read-lock 
in FSNameSystem rather than the write-lock. Also we need a way to know whether 
a dir is large. If there is a better solution I don't know, please tell me, and 
I think it need not to be very accurate.
{quote}
I am thinking for this, We can skip invoking expensive call 
{{getContentSummary}} for in first level dir since there will be a large chance 
as a big dir. For the child paths, we can do as current patch did. This might a 
better way I think.

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-226) Client should update block length in OM while committing the key

2018-07-26 Thread Mukul Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559179#comment-16559179
 ] 

Mukul Kumar Singh commented on HDDS-226:


Thanks for working on this  [~shashikant]. Apart from Ni

1) OmKeyInfo#updateBlockLength, there are 2 for loops. Normally, the order of 
the blocks in the ksmKeyLocations and in the blockIDList will be the same, so I 
feel this can be optimized by walking the list only once. Also once we have 
found a match, we should break from the first loop.

2) Also we have a DatanodeBlockID in DatanodeContainerProto, the block length 
is not an argument there. Should this be updated as well ?

> Client should update block length in OM while committing the key
> 
>
> Key: HDDS-226
> URL: https://issues.apache.org/jira/browse/HDDS-226
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-226.00.patch, HDDS-226.01.patch, HDDS-226.02.patch, 
> HDDS-226.03.patch, HDDS-226.04.patch, HDDS-226.05.patch
>
>
> Currently the client allocate a key of size with SCM block size, however a 
> client can always write smaller amount of data and close the key. The block 
> length in this case should be updated on OM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559171#comment-16559171
 ] 

Yiqun Lin edited comment on HDFS-13769 at 7/27/18 2:58 AM:
---

{quote}
Also clear checkpoint in trash is a typical situation of deleting a large dir, 
since the checkpoint dir of trash accumulates deleted files within several 
hours.
{quote}
Agree. We also met this problem. There is a big chance the checkpoint dir being 
a large dir. As [~kihwal] mentioned, for the safe deleting, it might not be a 
atomic operation. But it should be okay using for clearing trash dir.

{quote}
Wei-Chiu Chuang, Agree! getContentSummary is a recursive method and it may take 
several seconds if the dir is very large. getContentSummary holds the read-lock 
in FSNameSystem rather than the write-lock. Also we need a way to know whether 
a dir is large. If there is a better solution I don't know, please tell me, and 
I think it need not to be very accurate.
{quote}
I am thinking for this, We can skip invoking expensive call 
{{getContentSummary}} for in first level dir since there will be a large chance 
as a big dir. For the child paths, we can do as current patch did. This might a 
better way I think.


was (Author: linyiqun):
{quote}
Also clear checkpoint in trash is a typical situation of deleting a large dir, 
since the checkpoint dir of trash accumulates deleted files within several 
hours.
{quote}
Agree. We also met this problem. There is a big chance the checkpoint dir being 
a large dir. As [~kihwal] mentioned, for the safe deleting, it might not be a 
atomic operation. But it should be okay using for clearing trash dir.

{quote}
Wei-Chiu Chuang, Agree! getContentSummary is a recursive method and it may take 
several seconds if the dir is very large. getContentSummary holds the read-lock 
in FSNameSystem rather than the write-lock. Also we need a way to know whether 
a dir is large. If there is a better solution I don't know, please tell me, and 
I think it need not to be very accurate.
{quote}
I am thinking for this, we don't really need a limitation value 
{{FS_TRASH_SAFE_DELETE_ITEM_LIMIT_KEY}}. I mean if users enable the 
safe-deleteion trash policy, we are assuming the trash dir will have a big 
chance being a large dir. And we just use safe deletion way in 
{{deleteTrashInternal#safeDelete}}. And no need to invoke expensive  call 
{{getContentSummary}} to get the counts.

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread Yiqun Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13769:
-
Status: Patch Available  (was: Open)

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.0, 2.8.2
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559171#comment-16559171
 ] 

Yiqun Lin commented on HDFS-13769:
--

{quote}
Also clear checkpoint in trash is a typical situation of deleting a large dir, 
since the checkpoint dir of trash accumulates deleted files within several 
hours.
{quote}
Agree. We also met this problem. There is a big chance the checkpoint dir being 
a large dir. As [~kihwal] mentioned, for the safe deleting, it might not be a 
atomic operation. But it should be okay using for clearing trash dir.

{quote}
Wei-Chiu Chuang, Agree! getContentSummary is a recursive method and it may take 
several seconds if the dir is very large. getContentSummary holds the read-lock 
in FSNameSystem rather than the write-lock. Also we need a way to know whether 
a dir is large. If there is a better solution I don't know, please tell me, and 
I think it need not to be very accurate.
{quote}
I am thinking for this, we don't really need a limitation value 
{{FS_TRASH_SAFE_DELETE_ITEM_LIMIT_KEY}}. I mean if users enable the 
safe-deleteion trash policy, we are assuming the trash dir will have a big 
chance being a large dir. And we just use safe deletion way in 
{{deleteTrashInternal#safeDelete}}. And no need to invoke expensive  call 
{{getContentSummary}} to get the counts.

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13767) Add msync server implementation.

2018-07-26 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559149#comment-16559149
 ] 

Konstantin Shvachko commented on HDFS-13767:


I like the approach. Don't think preserving the ordering is needed. Seems you 
need to put some more work to complete this.
May be changing {{receiveRequestState(header)}} to return {{clientStateId}}, so 
that you could pass it into {{RpcCall}}, which is then retrieved in 
{{Handler.run()}} to verify if the SBN already caught up. BTW you can also 
incorporate the {{AC.isAlwaysRecent()}} logic inside {{receiveRequestState()}} 
instead of adding new method.

> Add msync server implementation.
> 
>
> Key: HDFS-13767
> URL: https://issues.apache.org/jira/browse/HDFS-13767
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13767.WIP.001.patch, HDFS-13767.WIP.002.patch
>
>
> This is a followup on HDFS-13688, where msync API is introduced to 
> {{ClientProtocol}} but the server side implementation is missing. This is 
> Jira is to implement the server side logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread Tao Jie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559125#comment-16559125
 ] 

Tao Jie commented on HDFS-13769:


[~csun], I agree with [~kihwal].  We cannot use this logic in the default 
delete operation, since it breaks the existing delete semantics. However we can 
use this logic in trash deletion which brings less side effect. Also clear 
checkpoint in trash is a typical situation of deleting a large dir, since the 
checkpoint dir of trash accumulates deleted files within several hours.

[~jojochuang], Agree! \{{getContentSummary}} is a recursive method and it may 
take several seconds if the dir is very large. \{{getContentSummary}} holds the 
read-lock in \{{FSNameSystem}} rather than the write-lock. Also we need a way 
to know whether a dir is large. If there is a better solution I don't know, 
please tell me, and I think it need not to be very accurate.

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-270) Move generic container util functions to ContianerUtils

2018-07-26 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-270:

Fix Version/s: 0.2.1

> Move generic container util functions to ContianerUtils
> ---
>
> Key: HDDS-270
> URL: https://issues.apache.org/jira/browse/HDDS-270
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-270.001.patch
>
>
> Some container util functions such as getContainerFile() are common for all 
> ContainerTypes. These functions should be moved to ContainerUtils.
> Also moved some fucntions to KeyValueContainer as applicable.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13771) enableManagedDfsDirsRedundancy typo in creating MiniDFSCluster

2018-07-26 Thread wilderchen (JIRA)
wilderchen created HDFS-13771:
-

 Summary: enableManagedDfsDirsRedundancy typo in creating 
MiniDFSCluster
 Key: HDFS-13771
 URL: https://issues.apache.org/jira/browse/HDFS-13771
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.0.3
Reporter: wilderchen


There is a typo (wrong parameter) happens on the function 

"initNameNodeConf" while calling "configureNameService" in file, 
"hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java".
 * prototype of initNameNodeConf is "void initNameNodeConf(Configuration conf, 
String nameserviceId, int nsIndex, String nnId, boolean manageNameDfsDirs, 
boolean enableManagedDfsDirsRedundancy, int nnIndex)"
 * the function call of initNameNodeConf in configureNameService is 
"initNameNodeConf(conf, nsId, nsCounter, nn.getNnId(), manageNameDfsDirs, 
manageNameDfsDirs,  nnIndex)"
 * expect function call to be "initNameNodeConf(conf, nsId, nsCounter, 
nn.getNnId(), manageNameDfsDirs, enableManagedDfsDirsRedundancy,  nnIndex)"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-270) Move generic container util functions to ContianerUtils

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559064#comment-16559064
 ] 

genericqa commented on HDDS-270:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  0s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
36s{color} | {color:red} hadoop-hdds_container-service generated 1 new + 3 
unchanged - 0 fixed = 4 total (was 3) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} container-service in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
6s{color} | {color:green} integration-test in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-270 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933275/HDDS-270.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0df7f0871355 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 

[jira] [Commented] (HDDS-268) Add SCM close container watcher

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559011#comment-16559011
 ] 

genericqa commented on HDDS-268:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
34s{color} | {color:red} hadoop-hdds/server-scm in trunk has 1 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
29s{color} | {color:green} framework in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 26s{color} 
| {color:red} server-scm in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 48s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.container.TestCloseContainerWatcher |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-268 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933269/HDDS-268.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b8b394062ada 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d70d845 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| findbugs | 

[jira] [Commented] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558999#comment-16558999
 ] 

Wei-Chiu Chuang commented on HDFS-13769:


{code}
ContentSummary cs = fs.getContentSummary(path);
{code}
is recursive in nature, so iterating on a big directory can be slow (probably 
not as slow as recursive delete). You should avoid calling it if possible.

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider at creation time for consistent UGI handling

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558993#comment-16558993
 ] 

genericqa commented on HDFS-13697:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 15s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
11s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
29s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
10s{color} | {color:green} hadoop-kms in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
47s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 18s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}254m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.datanode.TestDataNodeMXBean |
|   | hadoop.fs.viewfs.TestViewFileSystemHdfs |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.client.impl.TestBlockReaderLocal |
|   | hadoop.hdfs.TestErasureCodingExerciseAPIs |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce 

[jira] [Comment Edited] (HDDS-296) OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan

2018-07-26 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558977#comment-16558977
 ] 

Anu Engineer edited comment on HDDS-296 at 7/26/18 10:24 PM:
-

[~elek]/[~nandakumar131] Thanks for root causing this issue. I will take care 
of this, we cannot have a release without this getting fixed.

The reason I want to fix this is because this issue is just a symptom, we have 
these range scans at other places in code too, and OM has not gotten as much 
love as SCM :)


was (Author: anu):
[~elek]/[~nandakumar131] Thanks for root causing this issue. I will take care 
of this, we cannot have a release without this getting fixed.


> OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan
> -
>
> Key: HDDS-296
> URL: https://issues.apache.org/jira/browse/HDDS-296
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Anu Engineer
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: local.png
>
>
> We identified the problem during freon tests on real clusters. First I saw it 
> on a kubernetes based pseudo cluster (50 datanode, 1 freon). After a while 
> the rate of the key allocation was slowed down. (See the attached image).
> I could also reproduce the problem with local cluster (I used the 
> hadoop-dist/target/compose/ozoneperf setup). After the first 1 million keys 
> the key creation is almost stopped.
> With the help of [~nandakumar131] we identified the problem is the lock in 
> the ozone manager. (We profiled the OM with visual vm and found that the code 
> is locked for an extremity long time, also checked the rocksdb/rpc metrics 
> from prometheus and everything else was worked well.
> [~nandakumar131] suggested to use Instrumented lock in the OMMetadataManager. 
> With a custom build we identified that the problem is that the deletion 
> service holds the OMMetadataManager lock for a full range scan. For 1 million 
> keys it took about 10 seconds (with my local developer machine + ssd)
> {code}
> ozoneManager_1  | 2018-07-25 12:45:03 WARN  OMMetadataManager:143 - Lock held 
> time above threshold: lock identifier: OMMetadataManagerLock 
> lockHeldTimeMs=2648 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> ozoneManager_1  | 
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedReadLock.unlock(InstrumentedReadLock.java:78)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyManagerImpl.getPendingDeletionKeys(KeyManagerImpl.java:506)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:98)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:85)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> ozoneManager_1  | 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> ozoneManager_1  | 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ozoneManager_1  | 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ozoneManager_1  | java.lang.Thread.run(Thread.java:748)
> {code}
> I checked it with disabled DeletionService and worked well.
> Deletion service should be improved to make it work without long term locking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-296) OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan

2018-07-26 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558977#comment-16558977
 ] 

Anu Engineer commented on HDDS-296:
---

[~elek]/[~nandakumar131] Thanks for root causing this issue. I will take care 
of this, we cannot have a release without this getting fixed.


> OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan
> -
>
> Key: HDDS-296
> URL: https://issues.apache.org/jira/browse/HDDS-296
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: local.png
>
>
> We identified the problem during freon tests on real clusters. First I saw it 
> on a kubernetes based pseudo cluster (50 datanode, 1 freon). After a while 
> the rate of the key allocation was slowed down. (See the attached image).
> I could also reproduce the problem with local cluster (I used the 
> hadoop-dist/target/compose/ozoneperf setup). After the first 1 million keys 
> the key creation is almost stopped.
> With the help of [~nandakumar131] we identified the problem is the lock in 
> the ozone manager. (We profiled the OM with visual vm and found that the code 
> is locked for an extremity long time, also checked the rocksdb/rpc metrics 
> from prometheus and everything else was worked well.
> [~nandakumar131] suggested to use Instrumented lock in the OMMetadataManager. 
> With a custom build we identified that the problem is that the deletion 
> service holds the OMMetadataManager lock for a full range scan. For 1 million 
> keys it took about 10 seconds (with my local developer machine + ssd)
> {code}
> ozoneManager_1  | 2018-07-25 12:45:03 WARN  OMMetadataManager:143 - Lock held 
> time above threshold: lock identifier: OMMetadataManagerLock 
> lockHeldTimeMs=2648 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> ozoneManager_1  | 
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedReadLock.unlock(InstrumentedReadLock.java:78)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyManagerImpl.getPendingDeletionKeys(KeyManagerImpl.java:506)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:98)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:85)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> ozoneManager_1  | 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> ozoneManager_1  | 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ozoneManager_1  | 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ozoneManager_1  | java.lang.Thread.run(Thread.java:748)
> {code}
> I checked it with disabled DeletionService and worked well.
> Deletion service should be improved to make it work without long term locking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-296) OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan

2018-07-26 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer reassigned HDDS-296:
-

Assignee: Anu Engineer

> OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan
> -
>
> Key: HDDS-296
> URL: https://issues.apache.org/jira/browse/HDDS-296
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Anu Engineer
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: local.png
>
>
> We identified the problem during freon tests on real clusters. First I saw it 
> on a kubernetes based pseudo cluster (50 datanode, 1 freon). After a while 
> the rate of the key allocation was slowed down. (See the attached image).
> I could also reproduce the problem with local cluster (I used the 
> hadoop-dist/target/compose/ozoneperf setup). After the first 1 million keys 
> the key creation is almost stopped.
> With the help of [~nandakumar131] we identified the problem is the lock in 
> the ozone manager. (We profiled the OM with visual vm and found that the code 
> is locked for an extremity long time, also checked the rocksdb/rpc metrics 
> from prometheus and everything else was worked well.
> [~nandakumar131] suggested to use Instrumented lock in the OMMetadataManager. 
> With a custom build we identified that the problem is that the deletion 
> service holds the OMMetadataManager lock for a full range scan. For 1 million 
> keys it took about 10 seconds (with my local developer machine + ssd)
> {code}
> ozoneManager_1  | 2018-07-25 12:45:03 WARN  OMMetadataManager:143 - Lock held 
> time above threshold: lock identifier: OMMetadataManagerLock 
> lockHeldTimeMs=2648 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> ozoneManager_1  | 
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedReadLock.unlock(InstrumentedReadLock.java:78)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyManagerImpl.getPendingDeletionKeys(KeyManagerImpl.java:506)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:98)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:85)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> ozoneManager_1  | 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> ozoneManager_1  | 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ozoneManager_1  | 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ozoneManager_1  | java.lang.Thread.run(Thread.java:748)
> {code}
> I checked it with disabled DeletionService and worked well.
> Deletion service should be improved to make it work without long term locking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-270) Move generic container util functions to ContianerUtils

2018-07-26 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-270:

Summary: Move generic container util functions to ContianerUtils  (was: 
Move generic container utils to ContianerUitls)

> Move generic container util functions to ContianerUtils
> ---
>
> Key: HDDS-270
> URL: https://issues.apache.org/jira/browse/HDDS-270
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-270.001.patch
>
>
> Some container util functions such as getContainerFile() are common for all 
> ContainerTypes. These functions should be moved to ContainerUtils.
> Also moved some fucntions to KeyValueContainer as applicable.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-270) Move generic container utils to ContianerUitls

2018-07-26 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-270:

Attachment: HDDS-270.001.patch

> Move generic container utils to ContianerUitls
> --
>
> Key: HDDS-270
> URL: https://issues.apache.org/jira/browse/HDDS-270
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-270.001.patch
>
>
> Some container util functions such as getContainerFile() are common for all 
> ContainerTypes. These functions should be moved to ContainerUtils.
> Also moved some fucntions to KeyValueContainer as applicable.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-270) Move generic container utils to ContianerUitls

2018-07-26 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-270:

Status: Patch Available  (was: Open)

> Move generic container utils to ContianerUitls
> --
>
> Key: HDDS-270
> URL: https://issues.apache.org/jira/browse/HDDS-270
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-270.001.patch
>
>
> Some container util functions such as getContainerFile() are common for all 
> ContainerTypes. These functions should be moved to ContainerUtils.
> Also moved some fucntions to KeyValueContainer as applicable.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-271) Create a block iterator to iterate blocks in a container

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558941#comment-16558941
 ] 

genericqa commented on HDDS-271:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
27s{color} | {color:red} hadoop-hdds_container-service generated 1 new + 3 
unchanged - 0 fixed = 4 total (was 3) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
57s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
38s{color} | {color:green} container-service in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-271 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933261/HDDS-271.04.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ef9d1f1f2a2e 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d70d845 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| javadoc | 

[jira] [Commented] (HDFS-13767) Add msync server implementation.

2018-07-26 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558934#comment-16558934
 ] 

Chen Liang commented on HDFS-13767:
---

Upload WIP.002 patch. The main difference from WIP.001 patch is that, the logic 
to make sure the calls from the same client stays the  same processing order is 
removed. Specifically, if a call has state id large than server state id, the 
Handler will simply insert the call back to the callQueue and continue. As an 
example, say callQueue has two calls from same client [1,2]. 1 gets checked, 
and the server state id hasn't caught up. Then 1 gets added back to queue, 
making it [2, 1]. Then server caught up to state id, say, 3. Then 2 gets 
checked, and processed, then 1. So the processing order becomes 2,1.

But this is fine because even in current Server logic, there is no guarantee on 
the order: It is already possible that two handler threads pick up 1 and 2 
respectively and 2 finishes first. In fact, due to the synchronized natural of 
the API, only when the same client instance is used by multiple threads, there 
will be multiple calls from the same client in the callQueue. But in this case, 
there should be no expectation on ordering. Furthermore, this logic is for 
Observer exclusively, which only handles reads. (Please correct me if I'm wrong 
on this).

> Add msync server implementation.
> 
>
> Key: HDFS-13767
> URL: https://issues.apache.org/jira/browse/HDFS-13767
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13767.WIP.001.patch, HDFS-13767.WIP.002.patch
>
>
> This is a followup on HDFS-13688, where msync API is introduced to 
> {{ClientProtocol}} but the server side implementation is missing. This is 
> Jira is to implement the server side logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-268) Add SCM close container watcher

2018-07-26 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558925#comment-16558925
 ] 

Ajay Kumar commented on HDDS-268:
-

Patch v1 to rebase with trunk and add license to {{TestCloseContainerWatcher}}.

> Add SCM close container watcher
> ---
>
> Key: HDDS-268
> URL: https://issues.apache.org/jira/browse/HDDS-268
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Ajay Kumar
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-268.00.patch, HDDS-268.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-268) Add SCM close container watcher

2018-07-26 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDDS-268:

Attachment: HDDS-268.01.patch

> Add SCM close container watcher
> ---
>
> Key: HDDS-268
> URL: https://issues.apache.org/jira/browse/HDDS-268
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Ajay Kumar
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-268.00.patch, HDDS-268.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13767) Add msync server implementation.

2018-07-26 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13767:
--
Attachment: HDFS-13767.WIP.002.patch

> Add msync server implementation.
> 
>
> Key: HDFS-13767
> URL: https://issues.apache.org/jira/browse/HDFS-13767
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13767.WIP.001.patch, HDFS-13767.WIP.002.patch
>
>
> This is a followup on HDFS-13688, where msync API is introduced to 
> {{ClientProtocol}} but the server side implementation is missing. This is 
> Jira is to implement the server side logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-226) Client should update block length in OM while committing the key

2018-07-26 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558923#comment-16558923
 ] 

Tsz Wo Nicholas Sze commented on HDDS-226:
--

Hi [~shashikant], it seems not a good idea to add blockLength to BlockID.  
BlockID is used everywhere as an id.  How about adding KeyLocation to KeyArgs?

> Client should update block length in OM while committing the key
> 
>
> Key: HDDS-226
> URL: https://issues.apache.org/jira/browse/HDDS-226
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-226.00.patch, HDDS-226.01.patch, HDDS-226.02.patch, 
> HDDS-226.03.patch, HDDS-226.04.patch, HDDS-226.05.patch
>
>
> Currently the client allocate a key of size with SCM block size, however a 
> client can always write smaller amount of data and close the key. The block 
> length in this case should be updated on OM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-287) Add Close ContainerAction to Datanode#StateContext when the container gets full

2018-07-26 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558920#comment-16558920
 ] 

Xiaoyu Yao commented on HDDS-287:
-

Thanks [~nandakumar131] for the patch. It looks good to me. +1

We will need to add the handler part in SCM to process 
ContainerAction.Action.CLOSE once HDDS-245 is in.

> Add Close ContainerAction to Datanode#StateContext when the container gets 
> full
> ---
>
> Key: HDDS-287
> URL: https://issues.apache.org/jira/browse/HDDS-287
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.2.1
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-287.000.patch
>
>
> Datanode has to send Close ContainerAction to SCM whenever a container gets 
> full. {{Datanode#StateContext}} has {{containerActions}} queue from which the 
> ContainerActions are picked and sent as part of heartbeat. In this jira we 
> have to add ContainerAction to StateContext whenever a container get full.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8131) Implement a space balanced block placement policy

2018-07-26 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558894#comment-16558894
 ] 

Yongjun Zhang commented on HDFS-8131:
-

I just read HDFS-4946, I found it doesn't exactly do what I meant by comment #3 
above.

HDFS-4946 introduced a config to disable/enable preferLocalDN, if disabled, the 
localDN will be skipped for all application.  

Whereas when I wrote comment #3 above, I was thinking that when choosing the 
first DN, we could apply the same fix done here in HDFS-8131, such that we can 
choose either local or remote for the first DN, instead of always skipping the 
local DN.

Welcome to comment on this thought.

 

 

> Implement a space balanced block placement policy
> -
>
> Key: HDFS-8131
> URL: https://issues.apache.org/jira/browse/HDFS-8131
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
>  Labels: BlockPlacementPolicy
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: HDFS-8131-branch-2.7.patch, HDFS-8131-v1.diff, 
> HDFS-8131-v2.diff, HDFS-8131-v3.diff, HDFS-8131.004.patch, 
> HDFS-8131.005.patch, HDFS-8131.006.patch, balanced.png
>
>
> The default block placement policy will choose datanodes for new blocks 
> randomly, which will result in unbalanced space used percent among datanodes 
> after an cluster expansion. The old datanodes always are in high used percent 
> of space and new added ones are in low percent.
> Through we can used the external balance tool to balance the space used rate, 
> it will cost extra network IO and it's not easy to control the balance speed.
> An easy solution is to implement an balanced block placement policy which 
> will choose low used percent datanodes for new blocks with a little high 
> possibility. In a not long term, the used percent of datanodes will trend to 
> be balanced.
> Suggestions and discussions are welcomed. Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-291) Initialize hadoop metrics system in standalone hdds datanodes

2018-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558886#comment-16558886
 ] 

Hudson commented on HDDS-291:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14648 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14648/])
HDDS-291. Initialize hadoop metrics system in standalone hdds datanodes. (xyao: 
rev d70d84570575574b7e3ad0f00baf54f1dde76d97)
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/SCMConnectionManager.java


> Initialize hadoop metrics system in standalone hdds datanodes
> -
>
> Key: HDDS-291
> URL: https://issues.apache.org/jira/browse/HDDS-291
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Minor
> Fix For: 0.2.1
>
> Attachments: HDDS-291.001.patch
>
>
> Since HDDS-94 we can start a standalone HDDS datanode process without HDFS 
> datanode parts.
> But to see the hadoop metrics over the jmx interface we need to initialize 
> the hadoop metrics system (we have existing metrics by the storage io layer).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13770) dfsadmin -report does not always decrease "missing blocks (with replication factor 1)" metrics when file is deleted

2018-07-26 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558883#comment-16558883
 ] 

Xiao Chen commented on HDFS-13770:
--

Thanks Kitti for identifying this and providing a fix! Patch looks pretty good, 
some minor comments:
- We can extract a method {{decrementBlockStat}} in 
{{UnderReplicatedBlocks#remove}} for less duplication.
- We can tidy up the new 3-param {{remove}}: make it private, and point its 
javadoc to the 2-param one. Some thing like:
{code}* For details, see {@link #remove(BlockInfo, int)}  {code} and 
explain the difference only (i.e. how oldExpectedReplicas is used).
- Original javadoc had a typo: s/attmpted/attempted/g.
- Test should have a timeout
- Do you think it's helpful to add a few other sanity tests in the same test 
case? For example, oldExpectedReplica of 2 doesn't trigger a counter decrease. 
From code it's pretty clear, so this is really just adding some extra coverage. 
Up to you. :)

> dfsadmin -report does not always decrease "missing blocks (with replication 
> factor 1)" metrics when file is deleted
> ---
>
> Key: HDFS-13770
> URL: https://issues.apache.org/jira/browse/HDFS-13770
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.7
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13770-branch-2.001.patch
>
>
> Missing blocks (with replication factor 1) metric is not always decreased 
> when file is deleted.
> If a file is deleted, the remove function of UnderReplicatedBlocks can be 
> called with the wrong priority (UnderReplicatedBlocks.LEVEL), if it is called 
> with the wrong priority the corruptReplOneBlocks metric is not decreased, 
> however the block is removed from the priority queue which contains it.
> The corresponding code:
> {code:java}
> /** remove a block from a under replication queue */
> synchronized boolean remove(BlockInfo block,
>  int oldReplicas,
>  int oldReadOnlyReplicas,
>  int decommissionedReplicas,
>  int oldExpectedReplicas) {
>  final int priLevel = getPriority(oldReplicas, oldReadOnlyReplicas,
>  decommissionedReplicas, oldExpectedReplicas);
>  boolean removedBlock = remove(block, priLevel);
>  if (priLevel == QUEUE_WITH_CORRUPT_BLOCKS &&
>  oldExpectedReplicas == 1 &&
>  removedBlock) {
>  corruptReplOneBlocks--;
>  assert corruptReplOneBlocks >= 0 :
>  "Number of corrupt blocks with replication factor 1 " +
>  "should be non-negative";
>  }
>  return removedBlock;
> }
> /**
>  * Remove a block from the under replication queues.
>  *
>  * The priLevel parameter is a hint of which queue to query
>  * first: if negative or = \{@link #LEVEL} this shortcutting
>  * is not attmpted.
>  *
>  * If the block is not found in the nominated queue, an attempt is made to
>  * remove it from all queues.
>  *
>  * Warning: This is not a synchronized method.
>  * @param block block to remove
>  * @param priLevel expected privilege level
>  * @return true if the block was found and removed from one of the priority 
> queues
>  */
> boolean remove(BlockInfo block, int priLevel) {
>  if(priLevel >= 0 && priLevel < LEVEL
>  && priorityQueues.get(priLevel).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block {}" +
>  " from priority queue {}", block, priLevel);
>  return true;
>  } else {
>  // Try to remove the block from all queues if the block was
>  // not found in the queue for the given priority level.
>  for (int i = 0; i < LEVEL; i++) {
>  if (i != priLevel && priorityQueues.get(i).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block" +
>  " {} from priority queue {}", block, i);
>  return true;
>  }
>  }
>  }
>  return false;
> }
> {code}
> It is already fixed on trunk by this jira: HDFS-10999, but that ticket 
> introduces new metrics, which I think should't be backported to branch-2.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-277) PipelineStateMachine should handle closure of pipelines in SCM

2018-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558865#comment-16558865
 ] 

Hudson commented on HDDS-277:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14647 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14647/])
HDDS-277. PipelineStateMachine should handle closure of pipelines in (xyao: rev 
fd31cb6cfeef0c7e9bb0a054cb0f78853df8976f)
* (edit) 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/TestContainerPlacement.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerStateManager.java
* (edit) 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/closer/TestContainerCloser.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/states/ContainerStateMap.java
* (edit) 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestCloseContainerEventHandler.java
* (edit) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/container/common/helpers/ContainerInfo.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipelines/standalone/StandaloneManagerImpl.java
* (edit) 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestContainerMapping.java
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/TestContainerSQLCli.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipelines/ratis/RatisManagerImpl.java
* (add) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/pipeline/TestPipelineClose.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipelines/Node2PipelineMap.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/CloseContainerEventHandler.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
* (edit) 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/block/TestBlockManager.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerMapping.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipelines/PipelineManager.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipelines/PipelineSelector.java


> PipelineStateMachine should handle closure of pipelines in SCM
> --
>
> Key: HDDS-277
> URL: https://issues.apache.org/jira/browse/HDDS-277
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-277.001.patch, HDDS-277.002.patch, 
> HDDS-277.003.patch, HDDS-277.005.patch
>
>
> Currently the only visible state of pipelines in SCM is the open state. This 
> jira adds capability to PipelineStateMachine to close a SCM pipeline and 
> corresponding open containers on the pipeline. Once all the containers on the 
> pipeline have been closed then the nodes of the pipeline will be released 
> back to the free node pool



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-271) Create a block iterator to iterate blocks in a container

2018-07-26 Thread Bharat Viswanadham (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558862#comment-16558862
 ] 

Bharat Viswanadham commented on HDDS-271:
-

Hi [~nandakumar131]

Thanks for the review and offline discussion.

Addressed your review comments in patch v04.

> Create a block iterator to iterate blocks in a container
> 
>
> Key: HDDS-271
> URL: https://issues.apache.org/jira/browse/HDDS-271
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-271.00.patch, HDDS-271.01.patch, HDDS-271.02.patch, 
> HDDS-271.03.patch, HDDS-271.04.patch
>
>
> Create a block iterator to scan all blocks in a container.
> This one will be useful during implementation of container scanner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-271) Create a block iterator to iterate blocks in a container

2018-07-26 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-271:

Attachment: HDDS-271.04.patch

> Create a block iterator to iterate blocks in a container
> 
>
> Key: HDDS-271
> URL: https://issues.apache.org/jira/browse/HDDS-271
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-271.00.patch, HDDS-271.01.patch, HDDS-271.02.patch, 
> HDDS-271.03.patch, HDDS-271.04.patch
>
>
> Create a block iterator to scan all blocks in a container.
> This one will be useful during implementation of container scanner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-293) Reduce memory usage in KeyData

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558860#comment-16558860
 ] 

genericqa commented on HDDS-293:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
11s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 32m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 32m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
3s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
49s{color} | {color:green} container-service in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  6m 17s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-293 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933239/HDDS-293.20180726.patch
 |
| 

[jira] [Updated] (HDDS-291) Initialize hadoop metrics system in standalone hdds datanodes

2018-07-26 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-291:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~elek] for the contribution. I've commit the patch to trunk. 

> Initialize hadoop metrics system in standalone hdds datanodes
> -
>
> Key: HDDS-291
> URL: https://issues.apache.org/jira/browse/HDDS-291
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Minor
> Fix For: 0.2.1
>
> Attachments: HDDS-291.001.patch
>
>
> Since HDDS-94 we can start a standalone HDDS datanode process without HDFS 
> datanode parts.
> But to see the hadoop metrics over the jmx interface we need to initialize 
> the hadoop metrics system (we have existing metrics by the storage io layer).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-277) PipelineStateMachine should handle closure of pipelines in SCM

2018-07-26 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-277:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~msingh] for the contribution. I've committed the patch to trunk. 

> PipelineStateMachine should handle closure of pipelines in SCM
> --
>
> Key: HDDS-277
> URL: https://issues.apache.org/jira/browse/HDDS-277
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-277.001.patch, HDDS-277.002.patch, 
> HDDS-277.003.patch, HDDS-277.005.patch
>
>
> Currently the only visible state of pipelines in SCM is the open state. This 
> jira adds capability to PipelineStateMachine to close a SCM pipeline and 
> corresponding open containers on the pipeline. Once all the containers on the 
> pipeline have been closed then the nodes of the pipeline will be released 
> back to the free node pool



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-277) PipelineStateMachine should handle closure of pipelines in SCM

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558773#comment-16558773
 ] 

genericqa commented on HDDS-277:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 31m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
40s{color} | {color:red} hadoop-hdds/server-scm in trunk has 1 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 29m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
5s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
36s{color} | {color:green} server-scm in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  4m 23s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}137m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-277 |
| JIRA Patch URL | 

[jira] [Commented] (HDDS-252) Eliminate the datanode ID file

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558764#comment-16558764
 ] 

genericqa commented on HDDS-252:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 36 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
42s{color} | {color:red} hadoop-hdds/server-scm in trunk has 1 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 29m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
3s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
55s{color} | {color:green} container-service in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
27s{color} | {color:green} server-scm in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
37s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
34s{color} | {color:green} tools in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
33s{color} | {color:green} integration-test in the patch passed. {color} |
| {color:green}+1{color} | {color:green} 

[jira] [Comment Edited] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider at creation time for consistent UGI handling

2018-07-26 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558750#comment-16558750
 ] 

Zsolt Venczel edited comment on HDFS-13697 at 7/26/18 6:50 PM:
---

The latest patch contains:
* Revert of HDFS-7718 and HADOOP-13749.
* DFSClient creates and caches the KeyProvider at construction time.
* KMSClientProvider holds on to the UGI at creation time and also supports 
HADOOP-10698 efforts.
* HADOOP-11368 resolves the sslfactory truststore reloader thread leak

This patch does not cover the shared, periodic method for checking the 
truststore files. If you agree it could be covered in a separate jira.




was (Author: zvenczel):
The latest patch contains:
* Revert of HDFS-7718 and HADOOP-13749.
* DFSClient creates and caches the KeyProvider at construction time.
* KMSClientProvider holds on to the UGI at creation time and also supports 
HADOOP-10698 efforts.
* HADOOP-11368 resolves the sslfactory truststore reloader thread leak

This patch does not cover the shared, periodic method for checking the 
truststore files.
If you agree it could be covered in a separate jira.



> DFSClient should instantiate and cache KMSClientProvider at creation time for 
> consistent UGI handling
> -
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> 

[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider at creation time for consistent UGI handling

2018-07-26 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558750#comment-16558750
 ] 

Zsolt Venczel commented on HDFS-13697:
--

The latest patch contains:
* Revert of HDFS-7718 and HADOOP-13749.
* DFSClient creates and caches the KeyProvider at construction time.
* KMSClientProvider holds on to the UGI at creation time and also supports 
HADOOP-10698 efforts.
* HADOOP-11368 resolves the sslfactory truststore reloader thread leak

This patch does not cover the shared, periodic method for checking the 
truststore files.
If you agree it could be covered in a separate jira.



> DFSClient should instantiate and cache KMSClientProvider at creation time for 
> consistent UGI handling
> -
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> 

[jira] [Commented] (HDFS-8131) Implement a space balanced block placement policy

2018-07-26 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558746#comment-16558746
 ] 

Yongjun Zhang commented on HDFS-8131:
-

Hm, just noticed HDFS-4946 for my comment #3 above. 

Thanks.

> Implement a space balanced block placement policy
> -
>
> Key: HDFS-8131
> URL: https://issues.apache.org/jira/browse/HDFS-8131
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
>  Labels: BlockPlacementPolicy
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: HDFS-8131-branch-2.7.patch, HDFS-8131-v1.diff, 
> HDFS-8131-v2.diff, HDFS-8131-v3.diff, HDFS-8131.004.patch, 
> HDFS-8131.005.patch, HDFS-8131.006.patch, balanced.png
>
>
> The default block placement policy will choose datanodes for new blocks 
> randomly, which will result in unbalanced space used percent among datanodes 
> after an cluster expansion. The old datanodes always are in high used percent 
> of space and new added ones are in low percent.
> Through we can used the external balance tool to balance the space used rate, 
> it will cost extra network IO and it's not easy to control the balance speed.
> An easy solution is to implement an balanced block placement policy which 
> will choose low used percent datanodes for new blocks with a little high 
> possibility. In a not long term, the used percent of datanodes will trend to 
> be balanced.
> Suggestions and discussions are welcomed. Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8131) Implement a space balanced block placement policy

2018-07-26 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558739#comment-16558739
 ] 

Yongjun Zhang commented on HDFS-8131:
-

HI [~liushaohui],

Thanks much for the nice work here.

I have some comments.

1. This jira is described as "improvement" rather than new feature, it should 
be a new feature and be documented. 

2. A question related to the question [~Tagar] asked above:

https://issues.apache.org/jira/browse/HDFS-8131?focusedCommentId=15981732=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15981732

Class AvailableSpaceBlockPlacementPolicy extends BlockPlacementPolicyDefault. 
But it doesn't change the behavior of choosing the first node in 
BlockPlacementPolicyDefault, so even with this new feature, the local DN is 
always chosen as the first DN (of course when it is not excluded), and the new 
feature only changes the selection of the rest of the two DNs. 

3. Wonder if we could have another placement policy that could potentially have 
a choice to choose a different DN than local DN for the first node, so we don't 
always choose the local DN as the first node.

Would you please share your thoughts?

Thanks.


> Implement a space balanced block placement policy
> -
>
> Key: HDFS-8131
> URL: https://issues.apache.org/jira/browse/HDFS-8131
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
>  Labels: BlockPlacementPolicy
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: HDFS-8131-branch-2.7.patch, HDFS-8131-v1.diff, 
> HDFS-8131-v2.diff, HDFS-8131-v3.diff, HDFS-8131.004.patch, 
> HDFS-8131.005.patch, HDFS-8131.006.patch, balanced.png
>
>
> The default block placement policy will choose datanodes for new blocks 
> randomly, which will result in unbalanced space used percent among datanodes 
> after an cluster expansion. The old datanodes always are in high used percent 
> of space and new added ones are in low percent.
> Through we can used the external balance tool to balance the space used rate, 
> it will cost extra network IO and it's not easy to control the balance speed.
> An easy solution is to implement an balanced block placement policy which 
> will choose low used percent datanodes for new blocks with a little high 
> possibility. In a not long term, the used percent of datanodes will trend to 
> be balanced.
> Suggestions and discussions are welcomed. Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider at creation time for consistent UGI handling

2018-07-26 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13697:
-
Attachment: HDFS-13697.04.patch

> DFSClient should instantiate and cache KMSClientProvider at creation time for 
> consistent UGI handling
> -
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>  at 
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1440)
>  at 
> 

[jira] [Updated] (HDDS-293) Reduce memory usage in KeyData

2018-07-26 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-293:

Fix Version/s: 0.2.1

> Reduce memory usage in KeyData
> --
>
> Key: HDDS-293
> URL: https://issues.apache.org/jira/browse/HDDS-293
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-293.20180726.patch
>
>
> Currently, the field chunks is declared as a List in KeyData as 
> shown below.
> {code}
> //KeyData.java
>   private List chunks;
> {code}
> It is expected that many KeyData objects only have a single chunk.  We could 
> reduce the memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-291) Initialize hadoop metrics system in standalone hdds datanodes

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558694#comment-16558694
 ] 

genericqa commented on HDDS-291:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m  
0s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 56s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
43s{color} | {color:green} container-service in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-291 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933029/HDDS-291.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d69eee8acbae 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a192295 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDDS-Build/642/testReport/ |
| Max. process+thread count | 407 (vs. ulimit of 1) |
| modules | C: hadoop-hdds/container-service U: hadoop-hdds/container-service |
| Console output | 
https://builds.apache.org/job/PreCommit-HDDS-Build/642/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Initialize hadoop metrics system in standalone hdds datanodes
> 

[jira] [Updated] (HDDS-293) Reduce memory usage in KeyData

2018-07-26 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-293:
-
Status: Patch Available  (was: Open)

HDDS-293.20180726.patch: 1st patch

> Reduce memory usage in KeyData
> --
>
> Key: HDDS-293
> URL: https://issues.apache.org/jira/browse/HDDS-293
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-293.20180726.patch
>
>
> Currently, the field chunks is declared as a List in KeyData as 
> shown below.
> {code}
> //KeyData.java
>   private List chunks;
> {code}
> It is expected that many KeyData objects only have a single chunk.  We could 
> reduce the memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-293) Reduce memory usage in KeyData

2018-07-26 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-293:
-
Attachment: HDDS-293.20180726.patch

> Reduce memory usage in KeyData
> --
>
> Key: HDDS-293
> URL: https://issues.apache.org/jira/browse/HDDS-293
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-293.20180726.patch
>
>
> Currently, the field chunks is declared as a List in KeyData as 
> shown below.
> {code}
> //KeyData.java
>   private List chunks;
> {code}
> It is expected that many KeyData objects only have a single chunk.  We could 
> reduce the memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13622) mkdir should print the parent directory in the error message when parent directories do not exist

2018-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558684#comment-16558684
 ] 

Hudson commented on HDFS-13622:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14646 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14646/])
HDFS-13622. mkdir should print the parent directory in the error message (xiao: 
rev be150a17b15d15f5de6d4839d5e805e8d6c57850)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Mkdir.java


> mkdir should print the parent directory in the error message when parent 
> directories do not exist
> -
>
> Key: HDFS-13622
> URL: https://issues.apache.org/jira/browse/HDFS-13622
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Shweta
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13622.02.patch, HDFS-13622.03.patch, 
> HDFS-13622.04.patch, HDFS-13622.05.patch, HDFS-13622.06.patch
>
>
> this is a bit misleading:
> {code}
> $ hdfs  dfs -mkdir /nonexistent/newdir
> mkdir: `/nonexistent/newdir': No such file or directory
> {code}
> I think this command should fail because "nonexistent" doesn't exists...
> the correct would be:
> {code}
> $ hdfs  dfs -mkdir /nonexistent/newdir
> mkdir: `/nonexistent': No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558665#comment-16558665
 ] 

Kihwal Lee commented on HDFS-13769:
---

bq. This seems to apply not only for trash dir, but also any directory with 
large amount of data,
You mean the performance hit? Sure.  But the same kind of logic cannot be used 
as a generic solution. It is equivalent to users dividing a large dir structure 
and deleting them individually.  If this logic is applied by default in 
FSShell, it will break the delete semantics.  We might add an option for the 
FSShell to delete in this mode with a clear warning that the delete is no 
longer atomic.  In any case, we can't do this in RPC server side (i.e. 
namenode).

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-268) Add SCM close container watcher

2018-07-26 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558663#comment-16558663
 ] 

Ajay Kumar commented on HDDS-268:
-

[~elek] you might be interested in this as it involves some changes in 
EventWatcher.

> Add SCM close container watcher
> ---
>
> Key: HDDS-268
> URL: https://issues.apache.org/jira/browse/HDDS-268
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Ajay Kumar
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-268.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-268) Add SCM close container watcher

2018-07-26 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558662#comment-16558662
 ] 

Ajay Kumar commented on HDDS-268:
-

thanks [~nandakumar131]!! Will fix asf license for 
{{TestCloseContainerWatcher}} with any review comments.

> Add SCM close container watcher
> ---
>
> Key: HDDS-268
> URL: https://issues.apache.org/jira/browse/HDDS-268
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Ajay Kumar
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-268.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-268) Add SCM close container watcher

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558646#comment-16558646
 ] 

genericqa commented on HDDS-268:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
36s{color} | {color:red} hadoop-hdds/server-scm in trunk has 1 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
21s{color} | {color:red} server-scm in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
51s{color} | {color:red} hadoop-hdds in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 51s{color} 
| {color:red} hadoop-hdds in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
24s{color} | {color:red} server-scm in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
23s{color} | {color:red} server-scm in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
30s{color} | {color:green} framework in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 25s{color} 
| {color:red} server-scm in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
27s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 60m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-268 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932983/HDDS-268.00.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f5e5ecf72067 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a192295 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDDS-Build/639/artifact/out/branch-findbugs-hadoop-hdds_server-scm-warnings.html
 |
| 

[jira] [Commented] (HDFS-13622) mkdir should print the parent directory in the error message when parent directories do not exist

2018-07-26 Thread Shweta (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558645#comment-16558645
 ] 

Shweta commented on HDFS-13622:
---

Thank you [~xiaochen] for doing the commit. 

> mkdir should print the parent directory in the error message when parent 
> directories do not exist
> -
>
> Key: HDFS-13622
> URL: https://issues.apache.org/jira/browse/HDFS-13622
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Shweta
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13622.02.patch, HDFS-13622.03.patch, 
> HDFS-13622.04.patch, HDFS-13622.05.patch, HDFS-13622.06.patch
>
>
> this is a bit misleading:
> {code}
> $ hdfs  dfs -mkdir /nonexistent/newdir
> mkdir: `/nonexistent/newdir': No such file or directory
> {code}
> I think this command should fail because "nonexistent" doesn't exists...
> the correct would be:
> {code}
> $ hdfs  dfs -mkdir /nonexistent/newdir
> mkdir: `/nonexistent': No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13622) mkdir should print the parent directory in the error message when parent directories do not exist

2018-07-26 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558640#comment-16558640
 ] 

Xiao Chen edited comment on HDFS-13622 at 7/26/18 5:25 PM:
---

Failed tests look unrelated and passed locally.
Committed to trunk. Thanks for the contribution [~shwetayakkali]!


was (Author: xiaochen):
Committed to trunk. Thanks for the contribution [~shwetayakkali]!

> mkdir should print the parent directory in the error message when parent 
> directories do not exist
> -
>
> Key: HDFS-13622
> URL: https://issues.apache.org/jira/browse/HDFS-13622
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Shweta
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13622.02.patch, HDFS-13622.03.patch, 
> HDFS-13622.04.patch, HDFS-13622.05.patch, HDFS-13622.06.patch
>
>
> this is a bit misleading:
> {code}
> $ hdfs  dfs -mkdir /nonexistent/newdir
> mkdir: `/nonexistent/newdir': No such file or directory
> {code}
> I think this command should fail because "nonexistent" doesn't exists...
> the correct would be:
> {code}
> $ hdfs  dfs -mkdir /nonexistent/newdir
> mkdir: `/nonexistent': No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13622) mkdir should print the parent directory in the error message when parent directories do not exist

2018-07-26 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13622:
-
Summary: mkdir should print the parent directory in the error message when 
parent directories do not exist  (was: mkdir should not print the directory 
being created in the error message when parent directories do not exist)

> mkdir should print the parent directory in the error message when parent 
> directories do not exist
> -
>
> Key: HDFS-13622
> URL: https://issues.apache.org/jira/browse/HDFS-13622
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Shweta
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13622.02.patch, HDFS-13622.03.patch, 
> HDFS-13622.04.patch, HDFS-13622.05.patch, HDFS-13622.06.patch
>
>
> this is a bit misleading:
> {code}
> $ hdfs  dfs -mkdir /nonexistent/newdir
> mkdir: `/nonexistent/newdir': No such file or directory
> {code}
> I think this command should fail because "nonexistent" doesn't exists...
> the correct would be:
> {code}
> $ hdfs  dfs -mkdir /nonexistent/newdir
> mkdir: `/nonexistent': No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13622) mkdir should print the parent directory in the error message when parent directories do not exist

2018-07-26 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13622:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for the contribution [~shwetayakkali]!

> mkdir should print the parent directory in the error message when parent 
> directories do not exist
> -
>
> Key: HDFS-13622
> URL: https://issues.apache.org/jira/browse/HDFS-13622
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Shweta
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13622.02.patch, HDFS-13622.03.patch, 
> HDFS-13622.04.patch, HDFS-13622.05.patch, HDFS-13622.06.patch
>
>
> this is a bit misleading:
> {code}
> $ hdfs  dfs -mkdir /nonexistent/newdir
> mkdir: `/nonexistent/newdir': No such file or directory
> {code}
> I think this command should fail because "nonexistent" doesn't exists...
> the correct would be:
> {code}
> $ hdfs  dfs -mkdir /nonexistent/newdir
> mkdir: `/nonexistent': No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-277) PipelineStateMachine should handle closure of pipelines in SCM

2018-07-26 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558630#comment-16558630
 ] 

Xiaoyu Yao commented on HDDS-277:
-

[~msingh], can you rebase the patch?

> PipelineStateMachine should handle closure of pipelines in SCM
> --
>
> Key: HDDS-277
> URL: https://issues.apache.org/jira/browse/HDDS-277
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-277.001.patch, HDDS-277.002.patch, 
> HDDS-277.003.patch, HDDS-277.005.patch
>
>
> Currently the only visible state of pipelines in SCM is the open state. This 
> jira adds capability to PipelineStateMachine to close a SCM pipeline and 
> corresponding open containers on the pipeline. Once all the containers on the 
> pipeline have been closed then the nodes of the pipeline will be released 
> back to the free node pool



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-291) Initialize hadoop metrics system in standalone hdds datanodes

2018-07-26 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558610#comment-16558610
 ] 

Xiaoyu Yao commented on HDDS-291:
-

+1, pending Jenkins. I manually trigger a Jenkins run at: 
https://builds.apache.org/job/PreCommit-HDDS-Build/642/.

> Initialize hadoop metrics system in standalone hdds datanodes
> -
>
> Key: HDDS-291
> URL: https://issues.apache.org/jira/browse/HDDS-291
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Minor
> Fix For: 0.2.1
>
> Attachments: HDDS-291.001.patch
>
>
> Since HDDS-94 we can start a standalone HDDS datanode process without HDFS 
> datanode parts.
> But to see the hadoop metrics over the jmx interface we need to initialize 
> the hadoop metrics system (we have existing metrics by the storage io layer).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-10) docker changes to test secure ozone cluster

2018-07-26 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558602#comment-16558602
 ] 

Ajay Kumar commented on HDDS-10:


[~elek] thanks for checking latest patch. I might be missing something here but 
i don't see issuer binary included in  patch v3. (size of patch 3 is 16kb while 
patch v2 with binary files was ~4mb )

> docker changes to test secure ozone cluster
> ---
>
> Key: HDDS-10
> URL: https://issues.apache.org/jira/browse/HDDS-10
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: HDDS-10-HDDS-4.00.patch, HDDS-10-HDDS-4.01.patch, 
> HDDS-10-HDDS-4.02.patch, HDDS-10-HDDS-4.03.patch
>
>
> Update docker compose and settings to test secure ozone cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13770) dfsadmin -report does not always decrease "missing blocks (with replication factor 1)" metrics when file is deleted

2018-07-26 Thread Kitti Nanasi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558601#comment-16558601
 ] 

Kitti Nanasi commented on HDFS-13770:
-

[~jojochuang], I ran the same tests and 3.x does not have this bug.

> dfsadmin -report does not always decrease "missing blocks (with replication 
> factor 1)" metrics when file is deleted
> ---
>
> Key: HDFS-13770
> URL: https://issues.apache.org/jira/browse/HDFS-13770
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.7
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13770-branch-2.001.patch
>
>
> Missing blocks (with replication factor 1) metric is not always decreased 
> when file is deleted.
> If a file is deleted, the remove function of UnderReplicatedBlocks can be 
> called with the wrong priority (UnderReplicatedBlocks.LEVEL), if it is called 
> with the wrong priority the corruptReplOneBlocks metric is not decreased, 
> however the block is removed from the priority queue which contains it.
> The corresponding code:
> {code:java}
> /** remove a block from a under replication queue */
> synchronized boolean remove(BlockInfo block,
>  int oldReplicas,
>  int oldReadOnlyReplicas,
>  int decommissionedReplicas,
>  int oldExpectedReplicas) {
>  final int priLevel = getPriority(oldReplicas, oldReadOnlyReplicas,
>  decommissionedReplicas, oldExpectedReplicas);
>  boolean removedBlock = remove(block, priLevel);
>  if (priLevel == QUEUE_WITH_CORRUPT_BLOCKS &&
>  oldExpectedReplicas == 1 &&
>  removedBlock) {
>  corruptReplOneBlocks--;
>  assert corruptReplOneBlocks >= 0 :
>  "Number of corrupt blocks with replication factor 1 " +
>  "should be non-negative";
>  }
>  return removedBlock;
> }
> /**
>  * Remove a block from the under replication queues.
>  *
>  * The priLevel parameter is a hint of which queue to query
>  * first: if negative or = \{@link #LEVEL} this shortcutting
>  * is not attmpted.
>  *
>  * If the block is not found in the nominated queue, an attempt is made to
>  * remove it from all queues.
>  *
>  * Warning: This is not a synchronized method.
>  * @param block block to remove
>  * @param priLevel expected privilege level
>  * @return true if the block was found and removed from one of the priority 
> queues
>  */
> boolean remove(BlockInfo block, int priLevel) {
>  if(priLevel >= 0 && priLevel < LEVEL
>  && priorityQueues.get(priLevel).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block {}" +
>  " from priority queue {}", block, priLevel);
>  return true;
>  } else {
>  // Try to remove the block from all queues if the block was
>  // not found in the queue for the given priority level.
>  for (int i = 0; i < LEVEL; i++) {
>  if (i != priLevel && priorityQueues.get(i).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block" +
>  " {} from priority queue {}", block, i);
>  return true;
>  }
>  }
>  }
>  return false;
> }
> {code}
> It is already fixed on trunk by this jira: HDFS-10999, but that ticket 
> introduces new metrics, which I think should't be backported to branch-2.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider at creation time for consistent UGI handling

2018-07-26 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13697:
-
Attachment: (was: HDFS-13697.04.patch)

> DFSClient should instantiate and cache KMSClientProvider at creation time for 
> consistent UGI handling
> -
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>  at 
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1440)
>  at 
> 

[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider at creation time for consistent UGI handling

2018-07-26 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13697:
-
Attachment: HDFS-13697.04.patch

> DFSClient should instantiate and cache KMSClientProvider at creation time for 
> consistent UGI handling
> -
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>  at 
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1440)
>  at 
> 

[jira] [Updated] (HDDS-277) PipelineStateMachine should handle closure of pipelines in SCM

2018-07-26 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDDS-277:
---
Attachment: HDDS-277.005.patch

> PipelineStateMachine should handle closure of pipelines in SCM
> --
>
> Key: HDDS-277
> URL: https://issues.apache.org/jira/browse/HDDS-277
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-277.001.patch, HDDS-277.002.patch, 
> HDDS-277.003.patch, HDDS-277.005.patch
>
>
> Currently the only visible state of pipelines in SCM is the open state. This 
> jira adds capability to PipelineStateMachine to close a SCM pipeline and 
> corresponding open containers on the pipeline. Once all the containers on the 
> pipeline have been closed then the nodes of the pipeline will be released 
> back to the free node pool



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-277) PipelineStateMachine should handle closure of pipelines in SCM

2018-07-26 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDDS-277:
---
Attachment: (was: HDDS-277.004.patch)

> PipelineStateMachine should handle closure of pipelines in SCM
> --
>
> Key: HDDS-277
> URL: https://issues.apache.org/jira/browse/HDDS-277
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-277.001.patch, HDDS-277.002.patch, 
> HDDS-277.003.patch
>
>
> Currently the only visible state of pipelines in SCM is the open state. This 
> jira adds capability to PipelineStateMachine to close a SCM pipeline and 
> corresponding open containers on the pipeline. Once all the containers on the 
> pipeline have been closed then the nodes of the pipeline will be released 
> back to the free node pool



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-277) PipelineStateMachine should handle closure of pipelines in SCM

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558530#comment-16558530
 ] 

genericqa commented on HDDS-277:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDDS-277 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDDS-277 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933230/HDDS-277.004.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDDS-Build/640/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> PipelineStateMachine should handle closure of pipelines in SCM
> --
>
> Key: HDDS-277
> URL: https://issues.apache.org/jira/browse/HDDS-277
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-277.001.patch, HDDS-277.002.patch, 
> HDDS-277.003.patch, HDDS-277.004.patch
>
>
> Currently the only visible state of pipelines in SCM is the open state. This 
> jira adds capability to PipelineStateMachine to close a SCM pipeline and 
> corresponding open containers on the pipeline. Once all the containers on the 
> pipeline have been closed then the nodes of the pipeline will be released 
> back to the free node pool



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-268) Add SCM close container watcher

2018-07-26 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558524#comment-16558524
 ] 

Nanda kumar commented on HDDS-268:
--

Manually triggered Jenkins build: 
https://builds.apache.org/job/PreCommit-HDDS-Build/639/

> Add SCM close container watcher
> ---
>
> Key: HDDS-268
> URL: https://issues.apache.org/jira/browse/HDDS-268
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Ajay Kumar
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-268.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-277) PipelineStateMachine should handle closure of pipelines in SCM

2018-07-26 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDDS-277:
---
Attachment: HDDS-277.004.patch

> PipelineStateMachine should handle closure of pipelines in SCM
> --
>
> Key: HDDS-277
> URL: https://issues.apache.org/jira/browse/HDDS-277
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-277.001.patch, HDDS-277.002.patch, 
> HDDS-277.003.patch, HDDS-277.004.patch
>
>
> Currently the only visible state of pipelines in SCM is the open state. This 
> jira adds capability to PipelineStateMachine to close a SCM pipeline and 
> corresponding open containers on the pipeline. Once all the containers on the 
> pipeline have been closed then the nodes of the pipeline will be released 
> back to the free node pool



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-277) PipelineStateMachine should handle closure of pipelines in SCM

2018-07-26 Thread Mukul Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558523#comment-16558523
 ] 

Mukul Kumar Singh commented on HDDS-277:


Thanks for the review [~xyao]. Review comments are incorporated in the latest 
patch.

> PipelineStateMachine should handle closure of pipelines in SCM
> --
>
> Key: HDDS-277
> URL: https://issues.apache.org/jira/browse/HDDS-277
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-277.001.patch, HDDS-277.002.patch, 
> HDDS-277.003.patch, HDDS-277.004.patch
>
>
> Currently the only visible state of pipelines in SCM is the open state. This 
> jira adds capability to PipelineStateMachine to close a SCM pipeline and 
> corresponding open containers on the pipeline. Once all the containers on the 
> pipeline have been closed then the nodes of the pipeline will be released 
> back to the free node pool



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558514#comment-16558514
 ] 

Chao Sun commented on HDFS-13769:
-

This seems to apply not only for trash dir, but also any directory with large 
amount of data, is that correct [~Tao Jie]?

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13770) dfsadmin -report does not always decrease "missing blocks (with replication factor 1)" metrics when file is deleted

2018-07-26 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558509#comment-16558509
 ] 

Wei-Chiu Chuang commented on HDFS-13770:


Thanks [~knanasi] really good finding!
HDFS-10999 pertains to erasure coding, so no way to backport it in branch-2.

That said, because HDFS-10999 is a huge internal refactor, have you run the 
same test (without some modification) and verified the test is not reproducible 
in 3.x?

> dfsadmin -report does not always decrease "missing blocks (with replication 
> factor 1)" metrics when file is deleted
> ---
>
> Key: HDFS-13770
> URL: https://issues.apache.org/jira/browse/HDFS-13770
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.7
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13770-branch-2.001.patch
>
>
> Missing blocks (with replication factor 1) metric is not always decreased 
> when file is deleted.
> If a file is deleted, the remove function of UnderReplicatedBlocks can be 
> called with the wrong priority (UnderReplicatedBlocks.LEVEL), if it is called 
> with the wrong priority the corruptReplOneBlocks metric is not decreased, 
> however the block is removed from the priority queue which contains it.
> The corresponding code:
> {code:java}
> /** remove a block from a under replication queue */
> synchronized boolean remove(BlockInfo block,
>  int oldReplicas,
>  int oldReadOnlyReplicas,
>  int decommissionedReplicas,
>  int oldExpectedReplicas) {
>  final int priLevel = getPriority(oldReplicas, oldReadOnlyReplicas,
>  decommissionedReplicas, oldExpectedReplicas);
>  boolean removedBlock = remove(block, priLevel);
>  if (priLevel == QUEUE_WITH_CORRUPT_BLOCKS &&
>  oldExpectedReplicas == 1 &&
>  removedBlock) {
>  corruptReplOneBlocks--;
>  assert corruptReplOneBlocks >= 0 :
>  "Number of corrupt blocks with replication factor 1 " +
>  "should be non-negative";
>  }
>  return removedBlock;
> }
> /**
>  * Remove a block from the under replication queues.
>  *
>  * The priLevel parameter is a hint of which queue to query
>  * first: if negative or = \{@link #LEVEL} this shortcutting
>  * is not attmpted.
>  *
>  * If the block is not found in the nominated queue, an attempt is made to
>  * remove it from all queues.
>  *
>  * Warning: This is not a synchronized method.
>  * @param block block to remove
>  * @param priLevel expected privilege level
>  * @return true if the block was found and removed from one of the priority 
> queues
>  */
> boolean remove(BlockInfo block, int priLevel) {
>  if(priLevel >= 0 && priLevel < LEVEL
>  && priorityQueues.get(priLevel).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block {}" +
>  " from priority queue {}", block, priLevel);
>  return true;
>  } else {
>  // Try to remove the block from all queues if the block was
>  // not found in the queue for the given priority level.
>  for (int i = 0; i < LEVEL; i++) {
>  if (i != priLevel && priorityQueues.get(i).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block" +
>  " {} from priority queue {}", block, i);
>  return true;
>  }
>  }
>  }
>  return false;
> }
> {code}
> It is already fixed on trunk by this jira: HDFS-10999, but that ticket 
> introduces new metrics, which I think should't be backported to branch-2.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-252) Eliminate the datanode ID file

2018-07-26 Thread Bharat Viswanadham (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558491#comment-16558491
 ] 

Bharat Viswanadham commented on HDDS-252:
-

Fixed findbug issues in patch v07.

> Eliminate the datanode ID file
> --
>
> Key: HDDS-252
> URL: https://issues.apache.org/jira/browse/HDDS-252
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-252.00.patch, HDDS-252.01.patch, HDDS-252.02.patch, 
> HDDS-252.03.patch, HDDS-252.04.patch, HDDS-252.05.patch, HDDS-252.06.patch, 
> HDDS-252.07.patch
>
>
> This Jira is to remove the datanodeID file. After ContainerIO  work (HDDS-48 
> branch) is merged, we have a version file in each Volume which stores 
> datanodeUuid and some additional fields in that file.
> And also if this disk containing datanodeId path is removed, that DN will now 
> be unusable with current code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-252) Eliminate the datanode ID file

2018-07-26 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-252:

Attachment: HDDS-252.07.patch

> Eliminate the datanode ID file
> --
>
> Key: HDDS-252
> URL: https://issues.apache.org/jira/browse/HDDS-252
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-252.00.patch, HDDS-252.01.patch, HDDS-252.02.patch, 
> HDDS-252.03.patch, HDDS-252.04.patch, HDDS-252.05.patch, HDDS-252.06.patch, 
> HDDS-252.07.patch
>
>
> This Jira is to remove the datanodeID file. After ContainerIO  work (HDDS-48 
> branch) is merged, we have a version file in each Volume which stores 
> datanodeUuid and some additional fields in that file.
> And also if this disk containing datanodeId path is removed, that DN will now 
> be unusable with current code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13770) dfsadmin -report does not always decrease "missing blocks (with replication factor 1)" metrics when file is deleted

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558483#comment-16558483
 ] 

genericqa commented on HDFS-13770:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 21m  
6s{color} | {color:red} Docker failed to build yetus/hadoop:f667ef1. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-13770 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933221/HDFS-13770-branch-2.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24661/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> dfsadmin -report does not always decrease "missing blocks (with replication 
> factor 1)" metrics when file is deleted
> ---
>
> Key: HDFS-13770
> URL: https://issues.apache.org/jira/browse/HDFS-13770
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.7
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13770-branch-2.001.patch
>
>
> Missing blocks (with replication factor 1) metric is not always decreased 
> when file is deleted.
> If a file is deleted, the remove function of UnderReplicatedBlocks can be 
> called with the wrong priority (UnderReplicatedBlocks.LEVEL), if it is called 
> with the wrong priority the corruptReplOneBlocks metric is not decreased, 
> however the block is removed from the priority queue which contains it.
> The corresponding code:
> {code:java}
> /** remove a block from a under replication queue */
> synchronized boolean remove(BlockInfo block,
>  int oldReplicas,
>  int oldReadOnlyReplicas,
>  int decommissionedReplicas,
>  int oldExpectedReplicas) {
>  final int priLevel = getPriority(oldReplicas, oldReadOnlyReplicas,
>  decommissionedReplicas, oldExpectedReplicas);
>  boolean removedBlock = remove(block, priLevel);
>  if (priLevel == QUEUE_WITH_CORRUPT_BLOCKS &&
>  oldExpectedReplicas == 1 &&
>  removedBlock) {
>  corruptReplOneBlocks--;
>  assert corruptReplOneBlocks >= 0 :
>  "Number of corrupt blocks with replication factor 1 " +
>  "should be non-negative";
>  }
>  return removedBlock;
> }
> /**
>  * Remove a block from the under replication queues.
>  *
>  * The priLevel parameter is a hint of which queue to query
>  * first: if negative or = \{@link #LEVEL} this shortcutting
>  * is not attmpted.
>  *
>  * If the block is not found in the nominated queue, an attempt is made to
>  * remove it from all queues.
>  *
>  * Warning: This is not a synchronized method.
>  * @param block block to remove
>  * @param priLevel expected privilege level
>  * @return true if the block was found and removed from one of the priority 
> queues
>  */
> boolean remove(BlockInfo block, int priLevel) {
>  if(priLevel >= 0 && priLevel < LEVEL
>  && priorityQueues.get(priLevel).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block {}" +
>  " from priority queue {}", block, priLevel);
>  return true;
>  } else {
>  // Try to remove the block from all queues if the block was
>  // not found in the queue for the given priority level.
>  for (int i = 0; i < LEVEL; i++) {
>  if (i != priLevel && priorityQueues.get(i).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block" +
>  " {} from priority queue {}", block, i);
>  return true;
>  }
>  }
>  }
>  return false;
> }
> {code}
> It is already fixed on trunk by this jira: HDFS-10999, but that ticket 
> introduces new metrics, which I think should't be backported to branch-2.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-277) PipelineStateMachine should handle closure of pipelines in SCM

2018-07-26 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558464#comment-16558464
 ] 

Xiaoyu Yao commented on HDDS-277:
-

[~msingh], thanks for the update. Patch v2 looks good except one minor comment. 
+1 after that fixed.

PipelineSelector.java Line 339
{code}
NavigableSet containerIDS = containerStateManager
.getMatchingContainerIDsByPipeline(pipeline.getPipelineName());
if (pipeline.getLifeCycleState() == LifeCycleState.CLOSING &&
containerIDS.size() == 0) {
  updatePipelineState(pipeline, HddsProtos.LifeCycleEvent.CLOSE);
  LOG.info("Closing pipeline. pipelineID: {}", pipeline.getPipelineName());
}
{code}

Can we change it to  to avoid unnecessary getMatchingContainerIDsByPipeline 
call?

{code}
if (pipeline.getLifeCycleState() != LifeCycleState.CLOSING) {
  return;
}
NavigableSet containerIDS = containerStateManager
.getMatchingContainerIDsByPipeline(pipeline.getPipelineName());
if  (containerIDS.size() == 0) {
  updatePipelineState(pipeline, HddsProtos.LifeCycleEvent.CLOSE);
  LOG.info("Closing pipeline. pipelineID: {}", pipeline.getPipelineName());
}
{code}


> PipelineStateMachine should handle closure of pipelines in SCM
> --
>
> Key: HDDS-277
> URL: https://issues.apache.org/jira/browse/HDDS-277
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-277.001.patch, HDDS-277.002.patch, 
> HDDS-277.003.patch
>
>
> Currently the only visible state of pipelines in SCM is the open state. This 
> jira adds capability to PipelineStateMachine to close a SCM pipeline and 
> corresponding open containers on the pipeline. Once all the containers on the 
> pipeline have been closed then the nodes of the pipeline will be released 
> back to the free node pool



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13770) dfsadmin -report does not always decrease "missing blocks (with replication factor 1)" metrics when file is deleted

2018-07-26 Thread Kitti Nanasi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kitti Nanasi updated HDFS-13770:

Attachment: HDFS-13770-branch-2.001.patch

> dfsadmin -report does not always decrease "missing blocks (with replication 
> factor 1)" metrics when file is deleted
> ---
>
> Key: HDFS-13770
> URL: https://issues.apache.org/jira/browse/HDFS-13770
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.7
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13770-branch-2.001.patch
>
>
> Missing blocks (with replication factor 1) metric is not always decreased 
> when file is deleted.
> If a file is deleted, the remove function of UnderReplicatedBlocks can be 
> called with the wrong priority (UnderReplicatedBlocks.LEVEL), if it is called 
> with the wrong priority the corruptReplOneBlocks metric is not decreased, 
> however the block is removed from the priority queue which contains it.
> The corresponding code:
> {code:java}
> /** remove a block from a under replication queue */
> synchronized boolean remove(BlockInfo block,
>  int oldReplicas,
>  int oldReadOnlyReplicas,
>  int decommissionedReplicas,
>  int oldExpectedReplicas) {
>  final int priLevel = getPriority(oldReplicas, oldReadOnlyReplicas,
>  decommissionedReplicas, oldExpectedReplicas);
>  boolean removedBlock = remove(block, priLevel);
>  if (priLevel == QUEUE_WITH_CORRUPT_BLOCKS &&
>  oldExpectedReplicas == 1 &&
>  removedBlock) {
>  corruptReplOneBlocks--;
>  assert corruptReplOneBlocks >= 0 :
>  "Number of corrupt blocks with replication factor 1 " +
>  "should be non-negative";
>  }
>  return removedBlock;
> }
> /**
>  * Remove a block from the under replication queues.
>  *
>  * The priLevel parameter is a hint of which queue to query
>  * first: if negative or = \{@link #LEVEL} this shortcutting
>  * is not attmpted.
>  *
>  * If the block is not found in the nominated queue, an attempt is made to
>  * remove it from all queues.
>  *
>  * Warning: This is not a synchronized method.
>  * @param block block to remove
>  * @param priLevel expected privilege level
>  * @return true if the block was found and removed from one of the priority 
> queues
>  */
> boolean remove(BlockInfo block, int priLevel) {
>  if(priLevel >= 0 && priLevel < LEVEL
>  && priorityQueues.get(priLevel).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block {}" +
>  " from priority queue {}", block, priLevel);
>  return true;
>  } else {
>  // Try to remove the block from all queues if the block was
>  // not found in the queue for the given priority level.
>  for (int i = 0; i < LEVEL; i++) {
>  if (i != priLevel && priorityQueues.get(i).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block" +
>  " {} from priority queue {}", block, i);
>  return true;
>  }
>  }
>  }
>  return false;
> }
> {code}
> It is already fixed on trunk by this jira: HDFS-10999, but that ticket 
> introduces new metrics, which I think should't be backported to branch-2.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13770) dfsadmin -report does not always decrease "missing blocks (with replication factor 1)" metrics when file is deleted

2018-07-26 Thread Kitti Nanasi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kitti Nanasi updated HDFS-13770:

Status: Patch Available  (was: Open)

> dfsadmin -report does not always decrease "missing blocks (with replication 
> factor 1)" metrics when file is deleted
> ---
>
> Key: HDFS-13770
> URL: https://issues.apache.org/jira/browse/HDFS-13770
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.7
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13770-branch-2.001.patch
>
>
> Missing blocks (with replication factor 1) metric is not always decreased 
> when file is deleted.
> If a file is deleted, the remove function of UnderReplicatedBlocks can be 
> called with the wrong priority (UnderReplicatedBlocks.LEVEL), if it is called 
> with the wrong priority the corruptReplOneBlocks metric is not decreased, 
> however the block is removed from the priority queue which contains it.
> The corresponding code:
> {code:java}
> /** remove a block from a under replication queue */
> synchronized boolean remove(BlockInfo block,
>  int oldReplicas,
>  int oldReadOnlyReplicas,
>  int decommissionedReplicas,
>  int oldExpectedReplicas) {
>  final int priLevel = getPriority(oldReplicas, oldReadOnlyReplicas,
>  decommissionedReplicas, oldExpectedReplicas);
>  boolean removedBlock = remove(block, priLevel);
>  if (priLevel == QUEUE_WITH_CORRUPT_BLOCKS &&
>  oldExpectedReplicas == 1 &&
>  removedBlock) {
>  corruptReplOneBlocks--;
>  assert corruptReplOneBlocks >= 0 :
>  "Number of corrupt blocks with replication factor 1 " +
>  "should be non-negative";
>  }
>  return removedBlock;
> }
> /**
>  * Remove a block from the under replication queues.
>  *
>  * The priLevel parameter is a hint of which queue to query
>  * first: if negative or = \{@link #LEVEL} this shortcutting
>  * is not attmpted.
>  *
>  * If the block is not found in the nominated queue, an attempt is made to
>  * remove it from all queues.
>  *
>  * Warning: This is not a synchronized method.
>  * @param block block to remove
>  * @param priLevel expected privilege level
>  * @return true if the block was found and removed from one of the priority 
> queues
>  */
> boolean remove(BlockInfo block, int priLevel) {
>  if(priLevel >= 0 && priLevel < LEVEL
>  && priorityQueues.get(priLevel).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block {}" +
>  " from priority queue {}", block, priLevel);
>  return true;
>  } else {
>  // Try to remove the block from all queues if the block was
>  // not found in the queue for the given priority level.
>  for (int i = 0; i < LEVEL; i++) {
>  if (i != priLevel && priorityQueues.get(i).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block" +
>  " {} from priority queue {}", block, i);
>  return true;
>  }
>  }
>  }
>  return false;
> }
> {code}
> It is already fixed on trunk by this jira: HDFS-10999, but that ticket 
> introduces new metrics, which I think should't be backported to branch-2.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-287) Add Close ContainerAction to Datanode#StateContext when the container gets full

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558434#comment-16558434
 ] 

genericqa commented on HDDS-287:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
41s{color} | {color:red} hadoop-hdds/server-scm in trunk has 1 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 30m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
52s{color} | {color:green} container-service in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
35s{color} | {color:green} server-scm in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
33s{color} | {color:green} tools in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  6m 
32s{color} | {color:green} integration-test in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-287 |
| JIRA Patch URL | 

[jira] [Created] (HDFS-13770) dfsadmin -report does not always decrease "missing blocks (with replication factor 1)" metrics when file is deleted

2018-07-26 Thread Kitti Nanasi (JIRA)
Kitti Nanasi created HDFS-13770:
---

 Summary: dfsadmin -report does not always decrease "missing blocks 
(with replication factor 1)" metrics when file is deleted
 Key: HDFS-13770
 URL: https://issues.apache.org/jira/browse/HDFS-13770
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.7.7
Reporter: Kitti Nanasi
Assignee: Kitti Nanasi


Missing blocks (with replication factor 1) metric is not always decreased when 
file is deleted.

If a file is deleted, the remove function of UnderReplicatedBlocks can be 
called with the wrong priority (UnderReplicatedBlocks.LEVEL), if it is called 
with the wrong priority the corruptReplOneBlocks metric is not decreased, 
however the block is removed from the priority queue which contains it.

The corresponding code:
{code:java}
/** remove a block from a under replication queue */
synchronized boolean remove(BlockInfo block,
 int oldReplicas,
 int oldReadOnlyReplicas,
 int decommissionedReplicas,
 int oldExpectedReplicas) {
 final int priLevel = getPriority(oldReplicas, oldReadOnlyReplicas,
 decommissionedReplicas, oldExpectedReplicas);
 boolean removedBlock = remove(block, priLevel);
 if (priLevel == QUEUE_WITH_CORRUPT_BLOCKS &&
 oldExpectedReplicas == 1 &&
 removedBlock) {
 corruptReplOneBlocks--;
 assert corruptReplOneBlocks >= 0 :
 "Number of corrupt blocks with replication factor 1 " +
 "should be non-negative";
 }
 return removedBlock;
}

/**
 * Remove a block from the under replication queues.
 *
 * The priLevel parameter is a hint of which queue to query
 * first: if negative or = \{@link #LEVEL} this shortcutting
 * is not attmpted.
 *
 * If the block is not found in the nominated queue, an attempt is made to
 * remove it from all queues.
 *
 * Warning: This is not a synchronized method.
 * @param block block to remove
 * @param priLevel expected privilege level
 * @return true if the block was found and removed from one of the priority 
queues
 */
boolean remove(BlockInfo block, int priLevel) {
 if(priLevel >= 0 && priLevel < LEVEL
 && priorityQueues.get(priLevel).remove(block)) {
 NameNode.blockStateChangeLog.debug(
 "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block {}" +
 " from priority queue {}", block, priLevel);
 return true;
 } else {
 // Try to remove the block from all queues if the block was
 // not found in the queue for the given priority level.
 for (int i = 0; i < LEVEL; i++) {
 if (i != priLevel && priorityQueues.get(i).remove(block)) {
 NameNode.blockStateChangeLog.debug(
 "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block" +
 " {} from priority queue {}", block, i);
 return true;
 }
 }
 }
 return false;
}
{code}
It is already fixed on trunk by this jira: HDFS-10999, but that ticket 
introduces new metrics, which I think should't be backported to branch-2.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9260) Improve the performance and GC friendliness of NameNode startup and full block reports

2018-07-26 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558382#comment-16558382
 ] 

Kihwal Lee commented on HDFS-9260:
--

I propose revert of this. HDFS-13671 also reports about 4x slower performance.  
It might help GC, but regular operations are being affected too much.

> Improve the performance and GC friendliness of NameNode startup and full 
> block reports
> --
>
> Key: HDFS-9260
> URL: https://issues.apache.org/jira/browse/HDFS-9260
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, performance
>Affects Versions: 2.7.1
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
>Priority: Major
> Fix For: 3.0.0-alpha1
>
> Attachments: FBR processing.png, HDFS Block and Replica Management 
> 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, 
> HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, 
> HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, 
> HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, 
> HDFS-9260.013.patch, HDFS-9260.014.patch, HDFS-9260.015.patch, 
> HDFS-9260.016.patch, HDFS-9260.017.patch, HDFS-9260.018.patch, 
> HDFSBenchmarks.zip, HDFSBenchmarks2.zip
>
>
> This patch changes the datastructures used for BlockInfos and Replicas to 
> keep them sorted. This allows faster and more GC friendly handling of full 
> block reports.
> Would like to hear peoples feedback on this change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-201) Add name for LeaseManager

2018-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558334#comment-16558334
 ] 

Hudson commented on HDDS-201:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14645 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14645/])
HDDS-201. Add name for LeaseManager. Contributed by Sandeep Nemuri. (nanda: rev 
a19229594e48fad9f50dbdb1f0b2fcbf7443ce66)
* (edit) 
hadoop-hdds/common/src/test/java/org/apache/hadoop/ozone/lease/TestLeaseManager.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerMapping.java
* (edit) 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestReplicationManager.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipelines/PipelineSelector.java
* (edit) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/lease/LeaseManager.java
* (edit) 
hadoop-hdds/framework/src/test/java/org/apache/hadoop/hdds/server/events/TestEventWatcher.java


> Add name for LeaseManager
> -
>
> Key: HDDS-201
> URL: https://issues.apache.org/jira/browse/HDDS-201
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Sandeep Nemuri
>Priority: Minor
>  Labels: newbie
> Fix For: 0.2.1
>
> Attachments: HDDS-201.001.patch, HDDS-201.002.patch
>
>
> During the review of HDDS-195 we realised that one server could have multiple 
> LeaseManagers (for example one for the watchers one for the container 
> creation).
> To make it easier to monitor it would be good to use some specific names for 
> the release manager.
> This jira is about adding a new field (name) to the release manager which 
> should be defined by a constructor parameter and should be required.
> It should be used in the name of the Threads and all the log message 
> (Something like "Starting CommandWatcher LeasManager")



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDDS-271) Create a block iterator to iterate blocks in a container

2018-07-26 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-271:
-
Comment: was deleted

(was: As this is an iterator we can throw NoSuchElementException if the 
iteration has no more element instead of returning null.

If {{hasNext}} returns true we should be able to return the next block on 
{{nextBlock}} call.
Consider a case where we have two blocks [key1:block1, #deleting#key2:block2]. 
For the first {{hasNext}} call we will return {{true}} and the {{nextBlock}} 
call will return key1:block1. For the second {{hasNext}} call we will return 
{{true}} but the {{nextBlock}} call will return {{null}}. This will create 
inconsistent behavior in code wherever the iterator is used. We cannot fully 
rely on {{hasNext}} call anymore.)

> Create a block iterator to iterate blocks in a container
> 
>
> Key: HDDS-271
> URL: https://issues.apache.org/jira/browse/HDDS-271
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-271.00.patch, HDDS-271.01.patch, HDDS-271.02.patch, 
> HDDS-271.03.patch
>
>
> Create a block iterator to scan all blocks in a container.
> This one will be useful during implementation of container scanner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-271) Create a block iterator to iterate blocks in a container

2018-07-26 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558329#comment-16558329
 ] 

Nanda kumar commented on HDDS-271:
--

As this is an iterator we can throw NoSuchElementException if the iteration has 
no more element instead of returning null.

If hasNext returns true we should be able to return the next block on nextBlock 
call.
Consider a case where we have two blocks \[key1:block1, 
#deleting#key2:block2\]. For the first hasNext call we will return true and the 
nextBlock call will return key1:block1. For the second hasNext call we will 
return true but the nextBlock call will return null. This will create 
inconsistent behavior in code wherever the iterator is used. We cannot fully 
rely on hasNext call anymore.

> Create a block iterator to iterate blocks in a container
> 
>
> Key: HDDS-271
> URL: https://issues.apache.org/jira/browse/HDDS-271
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-271.00.patch, HDDS-271.01.patch, HDDS-271.02.patch, 
> HDDS-271.03.patch
>
>
> Create a block iterator to scan all blocks in a container.
> This one will be useful during implementation of container scanner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDDS-271) Create a block iterator to iterate blocks in a container

2018-07-26 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-271:
-
Comment: was deleted

(was: As this is an iterator we can throw NoSuchElementException if the 
iteration has no more element instead of returning null.

If {{hasNext}} returns true we should be able to return the next block on 
{{nextBlock}} call.
Consider a case where we have two blocks [key1:block1, #deleting#key2:block2]. 
For the first {{hasNext}} call we will return {{true}} and the {{nextBlock}} 
call will return key1:block1. For the second {{hasNext}} call we will return 
{{true}} but the {{nextBlock}} call will return {{null}}. This will create 
inconsistent behavior in code wherever the iterator is used. We cannot fully 
rely on {{hasNext}} call anymore.)

> Create a block iterator to iterate blocks in a container
> 
>
> Key: HDDS-271
> URL: https://issues.apache.org/jira/browse/HDDS-271
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-271.00.patch, HDDS-271.01.patch, HDDS-271.02.patch, 
> HDDS-271.03.patch
>
>
> Create a block iterator to scan all blocks in a container.
> This one will be useful during implementation of container scanner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-271) Create a block iterator to iterate blocks in a container

2018-07-26 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558327#comment-16558327
 ] 

Nanda kumar commented on HDDS-271:
--

As this is an iterator we can throw NoSuchElementException if the iteration has 
no more element instead of returning null.

If {{hasNext}} returns true we should be able to return the next block on 
{{nextBlock}} call.
Consider a case where we have two blocks [key1:block1, #deleting#key2:block2]. 
For the first {{hasNext}} call we will return {{true}} and the {{nextBlock}} 
call will return key1:block1. For the second {{hasNext}} call we will return 
{{true}} but the {{nextBlock}} call will return {{null}}. This will create 
inconsistent behavior in code wherever the iterator is used. We cannot fully 
rely on {{hasNext}} call anymore.

> Create a block iterator to iterate blocks in a container
> 
>
> Key: HDDS-271
> URL: https://issues.apache.org/jira/browse/HDDS-271
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-271.00.patch, HDDS-271.01.patch, HDDS-271.02.patch, 
> HDDS-271.03.patch
>
>
> Create a block iterator to scan all blocks in a container.
> This one will be useful during implementation of container scanner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-271) Create a block iterator to iterate blocks in a container

2018-07-26 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558326#comment-16558326
 ] 

Nanda kumar commented on HDDS-271:
--

As this is an iterator we can throw NoSuchElementException if the iteration has 
no more element instead of returning null.

If {{hasNext}} returns true we should be able to return the next block on 
{{nextBlock}} call.
Consider a case where we have two blocks [key1:block1, #deleting#key2:block2]. 
For the first {{hasNext}} call we will return {{true}} and the {{nextBlock}} 
call will return key1:block1. For the second {{hasNext}} call we will return 
{{true}} but the {{nextBlock}} call will return {{null}}. This will create 
inconsistent behavior in code wherever the iterator is used. We cannot fully 
rely on {{hasNext}} call anymore.

> Create a block iterator to iterate blocks in a container
> 
>
> Key: HDDS-271
> URL: https://issues.apache.org/jira/browse/HDDS-271
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-271.00.patch, HDDS-271.01.patch, HDDS-271.02.patch, 
> HDDS-271.03.patch
>
>
> Create a block iterator to scan all blocks in a container.
> This one will be useful during implementation of container scanner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-201) Add name for LeaseManager

2018-07-26 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558306#comment-16558306
 ] 

Nanda kumar commented on HDDS-201:
--

Thanks [~Sandeep Nemuri] for the contribution and [~elek] for suggesting this 
improvement and reviewing it. I have committed it to trunk.

> Add name for LeaseManager
> -
>
> Key: HDDS-201
> URL: https://issues.apache.org/jira/browse/HDDS-201
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Sandeep Nemuri
>Priority: Minor
>  Labels: newbie
> Fix For: 0.2.1
>
> Attachments: HDDS-201.001.patch, HDDS-201.002.patch
>
>
> During the review of HDDS-195 we realised that one server could have multiple 
> LeaseManagers (for example one for the watchers one for the container 
> creation).
> To make it easier to monitor it would be good to use some specific names for 
> the release manager.
> This jira is about adding a new field (name) to the release manager which 
> should be defined by a constructor parameter and should be required.
> It should be used in the name of the Threads and all the log message 
> (Something like "Starting CommandWatcher LeasManager")



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-201) Add name for LeaseManager

2018-07-26 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-201:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add name for LeaseManager
> -
>
> Key: HDDS-201
> URL: https://issues.apache.org/jira/browse/HDDS-201
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Sandeep Nemuri
>Priority: Minor
>  Labels: newbie
> Fix For: 0.2.1
>
> Attachments: HDDS-201.001.patch, HDDS-201.002.patch
>
>
> During the review of HDDS-195 we realised that one server could have multiple 
> LeaseManagers (for example one for the watchers one for the container 
> creation).
> To make it easier to monitor it would be good to use some specific names for 
> the release manager.
> This jira is about adding a new field (name) to the release manager which 
> should be defined by a constructor parameter and should be required.
> It should be used in the name of the Threads and all the log message 
> (Something like "Starting CommandWatcher LeasManager")



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread Tao Jie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558292#comment-16558292
 ] 

Tao Jie commented on HDFS-13769:


Hi, [~jojochuang] , the version of our cluster is 2.8.2, and this patch is 
based on the trunk. However I found the logic about the trash policy is almost 
the same in 2.8.2 and 3.

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-201) Add name for LeaseManager

2018-07-26 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558289#comment-16558289
 ] 

Nanda kumar commented on HDDS-201:
--

+1, looks good to me. I will commit this shortly.

> Add name for LeaseManager
> -
>
> Key: HDDS-201
> URL: https://issues.apache.org/jira/browse/HDDS-201
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Sandeep Nemuri
>Priority: Minor
>  Labels: newbie
> Fix For: 0.2.1
>
> Attachments: HDDS-201.001.patch, HDDS-201.002.patch
>
>
> During the review of HDDS-195 we realised that one server could have multiple 
> LeaseManagers (for example one for the watchers one for the container 
> creation).
> To make it easier to monitor it would be good to use some specific names for 
> the release manager.
> This jira is about adding a new field (name) to the release manager which 
> should be defined by a constructor parameter and should be required.
> It should be used in the name of the Threads and all the log message 
> (Something like "Starting CommandWatcher LeasManager")



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-287) Add Close ContainerAction to Datanode#StateContext when the container gets full

2018-07-26 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-287:
-
Affects Version/s: 0.2.1
   Status: Patch Available  (was: Open)

> Add Close ContainerAction to Datanode#StateContext when the container gets 
> full
> ---
>
> Key: HDDS-287
> URL: https://issues.apache.org/jira/browse/HDDS-287
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.2.1
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-287.000.patch
>
>
> Datanode has to send Close ContainerAction to SCM whenever a container gets 
> full. {{Datanode#StateContext}} has {{containerActions}} queue from which the 
> ContainerActions are picked and sent as part of heartbeat. In this jira we 
> have to add ContainerAction to StateContext whenever a container get full.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-287) Add Close ContainerAction to Datanode#StateContext when the container gets full

2018-07-26 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-287:
-
Attachment: HDDS-287.000.patch

> Add Close ContainerAction to Datanode#StateContext when the container gets 
> full
> ---
>
> Key: HDDS-287
> URL: https://issues.apache.org/jira/browse/HDDS-287
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-287.000.patch
>
>
> Datanode has to send Close ContainerAction to SCM whenever a container gets 
> full. {{Datanode#StateContext}} has {{containerActions}} queue from which the 
> ContainerActions are picked and sent as part of heartbeat. In this jira we 
> have to add ContainerAction to StateContext whenever a container get full.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558280#comment-16558280
 ] 

Wei-Chiu Chuang commented on HDFS-13769:


Just like to clarify a bit: did you observe this behavior on a Hadoop 2.8.2 
cluster as well? Or do you mean the patch (a new trash policy) is applicable to 
2.8.2? We changed the block internal data structure in Hadoop 3 so I would 
expect the performance regression happens to a Hadoop 3 cluster only. Thanks

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-219) Genearate version-info.properties for hadoop and ozone

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558235#comment-16558235
 ] 

genericqa commented on HDDS-219:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
24s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
33s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
9s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
43s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}128m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-219 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932927/HDDS-219.001.patch |
| Optional 

  1   2   >