[jira] [Commented] (HDDS-318) ratis INFO logs should not shown during ozoneFs command-line execution

2018-09-08 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608322#comment-16608322
 ] 

Lokesh Jain commented on HDDS-318:
--

[~szetszwo] Thanks for working on this! The patch looks very good to me. I have 
verified that the log messages of ConfUtils do not appear now. Can we make the 
block static?

 
{code:java}
bin/ozone oz -putKey /vol1/bb1/key1 -file /Users/ljain/Downloads/a.txt
2018-09-08 17:04:44,809 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
2018-09-08 17:04:45,368 INFO util.LogUtils: Set org.apache.ratis.conf.ConfUtils 
log level to WARN
2018-09-08 17:04:48,036 INFO util.LogUtils: Set org.apache.ratis.conf.ConfUtils 
log level to WARN
2018-09-08 17:04:50,375 INFO util.LogUtils: Set org.apache.ratis.conf.ConfUtils 
log level to WARN
2018-09-08 17:04:53,099 INFO util.LogUtils: Set org.apache.ratis.conf.ConfUtils 
log level to WARN
{code}
Then the log in LogUtils would appear only once.

 

> ratis INFO logs should not shown during ozoneFs command-line execution
> --
>
> Key: HDDS-318
> URL: https://issues.apache.org/jira/browse/HDDS-318
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Nilotpal Nandi
>Assignee: Tsz Wo Nicholas Sze
>Priority: Blocker
>  Labels: newbie
> Fix For: 0.2.1
>
> Attachments: HDDS-318.20180907.patch
>
>
> ratis INFOs should not be shown during ozoneFS CLI execution.
> Please find the snippet from one othe execution :
>  
> {noformat}
> hadoop@08315aa4b367:~/bin$ ./ozone fs -put /etc/passwd /p2
> 2018-08-02 12:17:18 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2018-08-02 12:17:19 INFO ConfUtils:41 - raft.rpc.type = GRPC (default)
> 2018-08-02 12:17:19 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 
> (custom)
> 2018-08-02 12:17:19 INFO ConfUtils:41 - raft.client.rpc.retryInterval = 300 
> ms (default)
> 2018-08-02 12:17:19 INFO ConfUtils:41 - 
> raft.client.async.outstanding-requests.max = 100 (default)
> 2018-08-02 12:17:19 INFO ConfUtils:41 - raft.client.async.scheduler-threads = 
> 3 (default)
> 2018-08-02 12:17:19 INFO ConfUtils:41 - raft.grpc.flow.control.window = 1MB 
> (=1048576) (default)
> 2018-08-02 12:17:19 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 
> (custom)
> 2018-08-02 12:17:20 INFO ConfUtils:41 - raft.client.rpc.request.timeout = 
> 3000 ms (default)
> Aug 02, 2018 12:17:20 PM 
> org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl detectProxy
> WARNING: Failed to construct URI for proxy lookup, proceeding without proxy
> ..
> ..
> ..
>  
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-410) ozone scmcli list is not working properly

2018-09-08 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDDS-410:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the contribution [~hanishakoneru]. I have committed this to trunk.

> ozone scmcli list is not working properly
> -
>
> Key: HDDS-410
> URL: https://issues.apache.org/jira/browse/HDDS-410
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nilotpal Nandi
>Assignee: Hanisha Koneru
>Priority: Minor
> Fix For: 0.3.0
>
> Attachments: HDDS-410.001.patch
>
>
> On running ozone scmcli for a container ID, it gives the following output :
>  
> {noformat}
> [root@ctr-e138-1518143905142-459606-01-02 bin]# ./ozone scmcli list 
> --start=17
> Infinite recursion (StackOverflowError) (through reference chain: 
> 

[jira] [Commented] (HDDS-410) ozone scmcli list is not working properly

2018-09-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608309#comment-16608309
 ] 

Hudson commented on HDDS-410:
-

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #14908 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14908/])
HDDS-410. ozone scmcli list is not working properly. Contributed by (msingh: 
rev d924ca2e1a826187de3f430ba30b966f5b5c2d55)
* (edit) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/container/common/helpers/ContainerInfo.java


> ozone scmcli list is not working properly
> -
>
> Key: HDDS-410
> URL: https://issues.apache.org/jira/browse/HDDS-410
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nilotpal Nandi
>Assignee: Hanisha Koneru
>Priority: Minor
> Fix For: 0.3.0
>
> Attachments: HDDS-410.001.patch
>
>
> On running ozone scmcli for a container ID, it gives the following output :
>  
> {noformat}
> [root@ctr-e138-1518143905142-459606-01-02 bin]# ./ozone scmcli list 
> --start=17
> Infinite recursion (StackOverflowError) (through reference chain: 
> 

[jira] [Commented] (HDDS-399) Handle pipeline discovery on SCM restart.

2018-09-08 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608306#comment-16608306
 ] 

Anu Engineer commented on HDDS-399:
---

[~msingh] , [~shashikant] Thanks for the patch. I have some minor feedback of 
the patch.

 
 # closePipeline -- we have added pipeline.delete(). However, closePipeline 
seems to be invoked for both close and Timeout. Not sure deleting the entry 
from the Pipeline DB for timeout is the right choice.
 # It would be nice if we could use a formal state machine like we have used in 
ContainerStateManager etc. Instead of having the state of a pipeline sprinkled 
all over the code with cases and functions. We have functions like 
addExistingPipeline(), processNodeReport(), finalizePipeline(), 
updatePipelineState -- where we are modifying the state of the pipeline. 
However, there is no one place where the state machine is defined.

> Handle pipeline discovery on SCM restart.
> -
>
> Key: HDDS-399
> URL: https://issues.apache.org/jira/browse/HDDS-399
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-399.001.patch, HDDS-399.002.patch
>
>
> On SCM restart, as part on node registration, SCM should find out the list on 
> open pipeline on the node. Once all the nodes of the pipeline have reported 
> back, they should be added as active pipelines for further allocations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13862) RBF: Router logs are not capturing few of the dfsrouteradmin commands

2018-09-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608100#comment-16608100
 ] 

Hadoop QA commented on HDFS-13862:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 
11s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m 25s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-13862 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938956/HDFS-13862-04.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fca46c2c4951 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bf8a175 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/25008/testReport/ |
| Max. process+thread count | 1063 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/25008/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> RBF: Router logs are not capturing few 

[jira] [Commented] (HDFS-13862) RBF: Router logs are not capturing few of the dfsrouteradmin commands

2018-09-08 Thread Ayush Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608075#comment-16608075
 ] 

Ayush Saxena commented on HDFS-13862:
-

Thanx [~brahmareddy] for the suggestions & [~elgoiri] for the discussion.
Have updated the patch as per the comments.
Pls Review!!!

> RBF: Router logs are not capturing few of the dfsrouteradmin commands
> -
>
> Key: HDFS-13862
> URL: https://issues.apache.org/jira/browse/HDFS-13862
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Soumyapn
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-13862-01.patch, HDFS-13862-02.patch, 
> HDFS-13862-03.patch, HDFS-13862-04.patch
>
>
> Test Steps :
> Below commands are not getting captured in the Router logs.
>  # Destination entry name in the add command. Log says "Added new mount point 
> /apps9 to resolver".
>  # Safemode enter|leave|get commands
>  # nameservice enable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13862) RBF: Router logs are not capturing few of the dfsrouteradmin commands

2018-09-08 Thread Ayush Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-13862:

Attachment: HDFS-13862-04.patch

> RBF: Router logs are not capturing few of the dfsrouteradmin commands
> -
>
> Key: HDFS-13862
> URL: https://issues.apache.org/jira/browse/HDFS-13862
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Soumyapn
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-13862-01.patch, HDFS-13862-02.patch, 
> HDFS-13862-03.patch, HDFS-13862-04.patch
>
>
> Test Steps :
> Below commands are not getting captured in the Router logs.
>  # Destination entry name in the add command. Log says "Added new mount point 
> /apps9 to resolver".
>  # Safemode enter|leave|get commands
>  # nameservice enable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-416) Fix bug in ChunkInputStreamEntry

2018-09-08 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608048#comment-16608048
 ] 

Lokesh Jain commented on HDDS-416:
--

[~nandakumar131] currentPosition is not updated in seek call. It is used in 
getRemaining() call which is further used in a read call by 
ChunkGroupInputStream.

> Fix bug in ChunkInputStreamEntry
> 
>
> Key: HDDS-416
> URL: https://issues.apache.org/jira/browse/HDDS-416
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-416.001.patch
>
>
> ChunkInputStreamEntry maintains currentPosition field. This field is 
> redundant and can be replaced by getPos().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12136) BlockSender performance regression due to volume scanner edge case

2018-09-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607968#comment-16607968
 ] 

Hadoop QA commented on HDFS-12136:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-12136 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12136 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12877142/HDFS-12136.trunk.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/25007/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> BlockSender performance regression due to volume scanner edge case
> --
>
> Key: HDFS-12136
> URL: https://issues.apache.org/jira/browse/HDFS-12136
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-12136.branch-2.patch, HDFS-12136.trunk.patch
>
>
> HDFS-11160 attempted to fix a volume scan race for a file appended mid-scan 
> by reading the last checksum of finalized blocks within the {{BlockSender}} 
> ctor.  Unfortunately it's holding the exclusive dataset lock to open and read 
> the metafile multiple times  Block sender instantiation becomes serialized.
> Performance completely collapses under heavy disk i/o utilization or high 
> xceiver activity.  Ex. lost node replication, balancing, or decommissioning.  
> The xceiver threads congest creating block senders and impair the heartbeat 
> processing that is contending for the same lock.  Combined with other lock 
> contention issues, pipelines break and nodes sporadically go dead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7967) Reduce the performance impact of the balancer

2018-09-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607967#comment-16607967
 ] 

Hadoop QA commented on HDFS-7967:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} HDFS-7967 does not apply to branch-2.8. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-7967 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12847070/HDFS-7967.branch-2.8.003.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/25006/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Reduce the performance impact of the balancer
> -
>
> Key: HDFS-7967
> URL: https://issues.apache.org/jira/browse/HDFS-7967
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-7967-branch-2.8.patch, HDFS-7967-branch-2.patch, 
> HDFS-7967.branch-2-1.patch, HDFS-7967.branch-2.001.patch, 
> HDFS-7967.branch-2.002.patch, HDFS-7967.branch-2.8-1.patch, 
> HDFS-7967.branch-2.8.001.patch, HDFS-7967.branch-2.8.002.patch, 
> HDFS-7967.branch-2.8.003.patch
>
>
> The balancer needs to query for blocks to move from overly full DNs.  The 
> block lookup is extremely inefficient.  An iterator of the node's blocks is 
> created from the iterators of its storages' blocks.  A random number is 
> chosen corresponding to how many blocks will be skipped via the iterator.  
> Each skip requires costly scanning of triplets.
> The current design also only considers node imbalances while ignoring 
> imbalances within the nodes's storages.  A more efficient and intelligent 
> design may eliminate the costly skipping of blocks via round-robin selection 
> of blocks from the storages based on remaining capacity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12914) Block report leases cause missing blocks until next report

2018-09-08 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-12914:
--
Target Version/s: 2.8.6  (was: 2.8.5)

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12136) BlockSender performance regression due to volume scanner edge case

2018-09-08 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-12136:
--
Target Version/s: 2.8.6  (was: 2.8.5)

> BlockSender performance regression due to volume scanner edge case
> --
>
> Key: HDFS-12136
> URL: https://issues.apache.org/jira/browse/HDFS-12136
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-12136.branch-2.patch, HDFS-12136.trunk.patch
>
>
> HDFS-11160 attempted to fix a volume scan race for a file appended mid-scan 
> by reading the last checksum of finalized blocks within the {{BlockSender}} 
> ctor.  Unfortunately it's holding the exclusive dataset lock to open and read 
> the metafile multiple times  Block sender instantiation becomes serialized.
> Performance completely collapses under heavy disk i/o utilization or high 
> xceiver activity.  Ex. lost node replication, balancing, or decommissioning.  
> The xceiver threads congest creating block senders and impair the heartbeat 
> processing that is contending for the same lock.  Combined with other lock 
> contention issues, pipelines break and nodes sporadically go dead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7967) Reduce the performance impact of the balancer

2018-09-08 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-7967:
-
Target Version/s: 2.8.6  (was: 2.8.5)

> Reduce the performance impact of the balancer
> -
>
> Key: HDFS-7967
> URL: https://issues.apache.org/jira/browse/HDFS-7967
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-7967-branch-2.8.patch, HDFS-7967-branch-2.patch, 
> HDFS-7967.branch-2-1.patch, HDFS-7967.branch-2.001.patch, 
> HDFS-7967.branch-2.002.patch, HDFS-7967.branch-2.8-1.patch, 
> HDFS-7967.branch-2.8.001.patch, HDFS-7967.branch-2.8.002.patch, 
> HDFS-7967.branch-2.8.003.patch
>
>
> The balancer needs to query for blocks to move from overly full DNs.  The 
> block lookup is extremely inefficient.  An iterator of the node's blocks is 
> created from the iterators of its storages' blocks.  A random number is 
> chosen corresponding to how many blocks will be skipped via the iterator.  
> Each skip requires costly scanning of triplets.
> The current design also only considers node imbalances while ignoring 
> imbalances within the nodes's storages.  A more efficient and intelligent 
> design may eliminate the costly skipping of blocks via round-robin selection 
> of blocks from the storages based on remaining capacity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13111) Close recovery may incorrectly mark blocks corrupt

2018-09-08 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-13111:
--
Target Version/s: 2.8.6  (was: 2.8.5)

> Close recovery may incorrectly mark blocks corrupt
> --
>
> Key: HDFS-13111
> URL: https://issues.apache.org/jira/browse/HDFS-13111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> Close recovery can leave a block marked corrupt until the next FBR arrives 
> from one of the DNs.  The reason is unclear but has happened multiple times 
> when a DN has io saturated disks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12704) FBR may corrupt block state

2018-09-08 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-12704:
--
Target Version/s: 2.8.6  (was: 2.8.5)

> FBR may corrupt block state
> ---
>
> Key: HDFS-12704
> URL: https://issues.apache.org/jira/browse/HDFS-12704
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> If FBR processing generates a runtime exception it is believed to foul the 
> block state and lead to unpredictable behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12703) Exceptions are fatal to decommissioning monitor

2018-09-08 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-12703:
--
Target Version/s: 2.8.6  (was: 2.8.5)

> Exceptions are fatal to decommissioning monitor
> ---
>
> Key: HDFS-12703
> URL: https://issues.apache.org/jira/browse/HDFS-12703
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> The {{DecommissionManager.Monitor}} runs as an executor scheduled task.  If 
> an exception occurs, all decommissioning ceases until the NN is restarted.  
> Per javadoc for {{executor#scheduleAtFixedRate}}: *If any execution of the 
> task encounters an exception, subsequent executions are suppressed*.  The 
> monitor thread is alive but blocked waiting for an executor task that will 
> never come.  The code currently disposes of the future so the actual 
> exception that aborted the task is gone.
> Failover is insufficient since the task is also likely dead on the standby.  
> Replication queue init after the transition to active will fix the under 
> replication of blocks on currently decommissioning nodes but future nodes 
> never decommission.  The standby must be bounced prior to failover – and 
> hopefully the error condition does not reoccur.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12747) Lease monitor may infinitely loop on the same lease

2018-09-08 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-12747:
--
Target Version/s: 2.8.6  (was: 2.8.5)

> Lease monitor may infinitely loop on the same lease
> ---
>
> Key: HDFS-12747
> URL: https://issues.apache.org/jira/browse/HDFS-12747
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
>
> Lease recovery incorrectly handles UC files if the last block is complete but 
> the penultimate block is committed.  Incorrectly handles is the euphemism for 
> infinitely loops for days and leaves all abandoned streams open until 
> customers complain.
> The problem may manifest when:
> # Block1 is committed but seemingly never completed
> # Block2 is allocated
> # Lease recovery is initiated for block2
> # Commit block synchronization invokes {{FSNamesytem#closeFileCommitBlocks}}, 
> causing:
> #* {{commitOrCompleteLastBlock}} to mark block2 as complete
> #* 
> {{finalizeINodeFileUnderConstruction}}/{{INodeFile.assertAllBlocksComplete}} 
> to throw {{IllegalStateException}} because the penultimate block1 is 
> "COMMITTED but not COMPLETE"
> # The next lease recovery results in an infinite loop.
> The {{LeaseManager}} expects that {{FSNamesystem#internalReleaseLease}} will 
> either init recovery and renew the lease, or remove the lease.  In the 
> described state it does neither.  The switch case will break out if the last 
> block is complete.  (The case statement ironically contains an assert).  
> Since nothing changed, the lease is still the “next” lease to be processed.  
> The lease monitor loops for 25ms on the same lease, sleeps for 2s, loops on 
> it again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12747) Lease monitor may infinitely loop on the same lease

2018-09-08 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607955#comment-16607955
 ] 

Junping Du commented on HDFS-12747:
---

move out to 2.8.6 given no progress for a while.

> Lease monitor may infinitely loop on the same lease
> ---
>
> Key: HDFS-12747
> URL: https://issues.apache.org/jira/browse/HDFS-12747
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
>
> Lease recovery incorrectly handles UC files if the last block is complete but 
> the penultimate block is committed.  Incorrectly handles is the euphemism for 
> infinitely loops for days and leaves all abandoned streams open until 
> customers complain.
> The problem may manifest when:
> # Block1 is committed but seemingly never completed
> # Block2 is allocated
> # Lease recovery is initiated for block2
> # Commit block synchronization invokes {{FSNamesytem#closeFileCommitBlocks}}, 
> causing:
> #* {{commitOrCompleteLastBlock}} to mark block2 as complete
> #* 
> {{finalizeINodeFileUnderConstruction}}/{{INodeFile.assertAllBlocksComplete}} 
> to throw {{IllegalStateException}} because the penultimate block1 is 
> "COMMITTED but not COMPLETE"
> # The next lease recovery results in an infinite loop.
> The {{LeaseManager}} expects that {{FSNamesystem#internalReleaseLease}} will 
> either init recovery and renew the lease, or remove the lease.  In the 
> described state it does neither.  The switch case will break out if the last 
> block is complete.  (The case statement ironically contains an assert).  
> Since nothing changed, the lease is still the “next” lease to be processed.  
> The lease monitor loops for 25ms on the same lease, sleeps for 2s, loops on 
> it again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-410) ozone scmcli list is not working properly

2018-09-08 Thread Mukul Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607953#comment-16607953
 ] 

Mukul Kumar Singh commented on HDDS-410:


Thanks for working on this [~hanishakoneru].
+1, The patch looks good to me. I will commit this shortly.

> ozone scmcli list is not working properly
> -
>
> Key: HDDS-410
> URL: https://issues.apache.org/jira/browse/HDDS-410
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nilotpal Nandi
>Assignee: Hanisha Koneru
>Priority: Minor
> Fix For: 0.3.0
>
> Attachments: HDDS-410.001.patch
>
>
> On running ozone scmcli for a container ID, it gives the following output :
>  
> {noformat}
> [root@ctr-e138-1518143905142-459606-01-02 bin]# ./ozone scmcli list 
> --start=17
> Infinite recursion (StackOverflowError) (through reference chain: 
> 

[jira] [Commented] (HDDS-369) Remove the containers of a dead node from the container state map

2018-09-08 Thread LiXin Ge (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607934#comment-16607934
 ] 

LiXin Ge commented on HDDS-369:
---

Hi [~elek], sorry for the late review, is there have chance to introduce a 
NullPointerException in {{DeadNodeHandler#onMessage}} in the situation below?
1. A new datanode register to SCM.
2. There is no container allocated in the new datanode temporarily.
3.The new datanode dead and an event was fired to {{DeadNodeHandler}}
4.In function {{onMessage}}, there will get nothing in {{node2ContainerMap}} 
and {{containers}} will be {{NULL}}
5.NullPointerException will be throwen in the following iterate of 
{{containers}}.

Shall we iterate the {{containers}} only when it is not null? I can create a 
new Jira to fix this if needed.

> Remove the containers of a dead node from the container state map
> -
>
> Key: HDDS-369
> URL: https://issues.apache.org/jira/browse/HDDS-369
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-369.001.patch, HDDS-369.002.patch, 
> HDDS-369.003.patch, HDDS-369.004.patch, HDDS-369.005.patch, HDDS-369.006.patch
>
>
> In case of a node is dead we need to update the container replicas 
> information of the containerStateMap for all the containers from that 
> specific node.
> With removing the replica information we can detect the under replicated 
> state and start the replication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-190) Improve shell error message for unrecognized option

2018-09-08 Thread Sandeep Nemuri (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607930#comment-16607930
 ] 

Sandeep Nemuri commented on HDDS-190:
-

Thanks [~elek] and [~dineshchitlangia] for jumping in and providing the fix :)

> Improve shell error message for unrecognized option
> ---
>
> Key: HDDS-190
> URL: https://issues.apache.org/jira/browse/HDDS-190
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Arpit Agarwal
>Assignee: Elek, Marton
>Priority: Blocker
>  Labels: newbie
> Fix For: 0.2.1
>
> Attachments: HDDS-190.001.patch, HDDS-190.002.patch
>
>
> The error message with an unrecognized option is unfriendly. E.g.
> {code}
> $ ozone oz -badOption
> Unrecognized option: -badOptionERROR: null
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-416) Fix bug in ChunkInputStreamEntry

2018-09-08 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607926#comment-16607926
 ] 

Nanda kumar commented on HDDS-416:
--

[~ljain], the existing logic doesn't look like a bug. This is more like a 
performance optimization. Rather than checking the chunkOffset and buffer 
position of {{ChunkInputStream}} each time, {{currentPosition}} in 
{{ChunkInputStreamEntry}} caches the value which can be directly returned to 
the caller.

> Fix bug in ChunkInputStreamEntry
> 
>
> Key: HDDS-416
> URL: https://issues.apache.org/jira/browse/HDDS-416
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-416.001.patch
>
>
> ChunkInputStreamEntry maintains currentPosition field. This field is 
> redundant and can be replaced by getPos().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-362) Modify functions impacted by SCM chill mode in ScmBlockLocationProtocol

2018-09-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607924#comment-16607924
 ] 

Hadoop QA commented on HDDS-362:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
56s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
30s{color} | {color:green} server-scm in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-362 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938937/HDDS-362.01.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  javadoc  
mvninstall  shadedclient  findbugs  checkstyle  |
| uname | Linux d68b6f255590 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bf8a175 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results |