[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2020-07-18 Thread dmichal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17160540#comment-17160540
 ] 

dmichal commented on HDFS-15000:


I have one more idea about how this issue could be solved. In addition to 
global locks (shared for all blocks), block-specific locks can be introduced. 
In such a case {{FsDatasetImpl.createRbw()}} could work as follows:
{code:java}
block-specific write lock {
  global lock {
   1. check if block_id exists
   2. other checks
   3. ...
  }
   4. perform the IO
  global lock {
   5. update the volume map
  }
}
{code}
Then the race conditions mentioned by [~sodonnell] in [~Aiphag0]'s solution 
won't occur, since no other thread attempting to access the same block will be 
allowed.

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2020-07-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157669#comment-17157669
 ] 

Hadoop QA commented on HDFS-15000:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  9s{color} 
| {color:red} HDFS-15000 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15000 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12989259/HDFS-15000.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29506/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2020-07-14 Thread dmichal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157657#comment-17157657
 ] 

dmichal commented on HDFS-15000:


How about distinguishing in volume map between blocks for which IO operation 
has finished from the ones for which the IO operation is still in progress? 
Then the {{FsDatasetImpl.createRbw()}} method could work in the following way:
{code:java}
lock {
 1. check if block_id exists (either as 'finished' or as 'in progress')
 2. perform other checks
 3. select the volume
 4. add the block to the volume map (as 'in progress')
}
 5. perform the IO
lock {
 6. change the block status to 'finished' or clean up in case of IO error
}
{code}
Maybe even this second lock is not necessary?

Two implementations that come to my mind are:
 # Keep the information about the status in the volume map.
 # Create a separate volume map for blocks with IO in progress.

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-22 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002126#comment-17002126
 ] 

Xiaoqiao He commented on HDFS-15000:


Thanks [~sodonnell] for your valuable comments. Actually, it is OK for me 
whatever take IO operation out from lock or to be async, maybe it is simpler if 
we release lock before IO operation and re-take it when finish IO operation. 
whatever, I think we have to keep the same semantic, especially how to rollback 
while IO operation failed as said above. Thanks again.
[~Aiphag0] any thought here?

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001227#comment-17001227
 ] 

Hadoop QA commented on HDFS-15000:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
26s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 32s{color} | {color:orange} root: The patch generated 7 new + 81 unchanged - 
0 fixed = 88 total (was 81) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 17m 
34s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
29s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 
unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m  
0s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}590m 14s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  1m 
16s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}710m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Unread field:FsDatasetLock.java:[line 36] |
|  |  Condition.await() not in loop in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetLock.releaseLockUntilFinish(Future)
  At 
FsDatasetLock.java:org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetLock.releaseLockUntilFinish(Future)
  At FsDatasetLock.java:[line 50] |
| Failed junit tests | hadoop.hdfs.TestFileStatus |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitCache |
|   | hadoop.hdfs.TestQuota |
|   | hadoop.hdfs.TestMultiThreadedHflush |
|   | hadoop.hdfs.TestModTime |
|   | hadoop.hdfs.

[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001206#comment-17001206
 ] 

Hadoop QA commented on HDFS-15000:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
44s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 26s{color} | {color:orange} root: The patch generated 7 new + 81 unchanged - 
0 fixed = 88 total (was 81) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 18m 
45s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
37s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 
unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
55s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}542m 40s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue}  0m 
43s{color} | {color:blue} ASF License check generated no output? {color} |
| {color:black}{color} | {color:black} {color} | {color:black}669m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Wait not in loop in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetLock.releaseLockUntilFinish(Future)
  At 
FsDatasetLock.java:org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetLock.releaseLockUntilFinish(Future)
  At FsDatasetLock.java:[line 51] |
|  |  Wait not in loop in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetLock.waitUntilFinish()
  At 
FsDatasetLock.java:org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetLock.waitUntilFinish()
  At FsDatasetLock.java:[line 63] |
| Failed junit tests | 
hadoop.hdfs.

[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000814#comment-17000814
 ] 

Hadoop QA commented on HDFS-15000:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
51s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
13s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
37s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  2m 
50s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  2m 50s{color} 
| {color:red} root in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 14s{color} | {color:orange} root: The patch generated 6 new + 81 unchanged - 
0 fixed = 87 total (was 81) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
48s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m  
4s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
39s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
38s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 35s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}101m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:e573ea49085 |
| JIRA Issue | HDFS-15000 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12989257/HDFS-15000.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 76ebc61818b8 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f777cd3 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 

[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-20 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000790#comment-17000790
 ] 

Stephen O'Donnell commented on HDFS-15000:
--

In the createRbw() method, there is still a race condition with your method. 
The current implementation checks the volumeMap to ensure the block does not 
already exist to avoid a duplicate getting created. If you drop the lock before 
updating the volume map, then a duplicate can come in during the async IO phase 
but before the volume map is updated.

I agree my suggestion has the flaw you pointed out, that the volume map would 
now have a reference to a block on disk which does not exist yet and that could 
be a problem too.

With your idea, why do we need the IO to be async? Could it be simplified to:

1. Existing steps in the method
2. Drop the lock, perform the IO in the current thread
3. Re-take the lock and perform the remaining steps (update volume map)

The current thread cannot do anything else while it waits for the IO anyway, so 
perhaps it is simpler to just to the IO on that thread.

We still need to think of a way to keep things consistent when the lock is 
released, which is the hard part.

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-20 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000742#comment-17000742
 ] 

Aiphago commented on HDFS-15000:


Hi [~sodonnell] ,Thanks for your comment.I have some question with your idea.

??I wonder if it would be possible to refactor things so we do steps 1, 2, 3 
and 5, drop the lock and then do the IO operation to actually create the file. 
In the event the IO fails, re-take the lock and clean up the volume map.??

If we do the IO after step 5,and now the volume map have the replica info but 
actually the IO may not done.If some error happend with IO,you should get lock 
again so volume map will have this replica for a long time,this may cause 
consistency problem when other thread get this  replica.

and my thought is
 # Not change the step in one method.
 # Make the IO async and release the lock wait util other thread signal this 
thread when finish the IO.
 # Keep the IO operation is in order in different methods as #finalizeBlock, 
#finalizeReplica, #createRbw.
 # after IO operation change then change the volume map.

 

 

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-19 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000451#comment-17000451
 ] 

Hadoop QA commented on HDFS-15000:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} HDFS-15000 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15000 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12989167/HDFS-15000.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28545/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-19 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1706#comment-1706
 ] 

Stephen O'Donnell commented on HDFS-15000:
--

I wonder if this could be done by simply releasing the lock, do the IO, re-take 
the lock, and avoid the need for futures etc.

I have not looked at this in great detail, so my suggestion my have some flaws. 
Looking at FsDatasetImpl.createRbw(), it does approximately the following:

{code}
lock {
  1. check_block_id does not already exist
  2. check enough space available etc
  3. select the volume

  4. Perform the IO via newReplicaInfo = v.createRbw(b);

  5. Add the new block to the volume map
}
{code}

A problem with dropping the lock in the middle while doing the IO is that 
another thread could come in with the same block ID, and it would pass check 
(1) above, and then we would have a race condition.

I wonder if it would be possible to refactor things so we do steps 1, 2, 3 and 
5, drop the lock and then do the IO operation to actually create the file. In 
the event the IO fails, re-take the lock and clean up the volume map.

This would require some refactoring of a few methods, as the volume map needs 
to store a reference to the replicaInfo, which currently is created in 
v.createRbw(b) along with the file on disk, but I don't think it needs to be - 
we could break those two apart.

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-18 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999789#comment-16999789
 ] 

Xiaoqiao He commented on HDFS-15000:


Thanks [~Aiphag0], it's a very good start here. Some suggestions,
a. Please rebase codebase then submit new patch.
b. We should consider case how to rollback meta information of {{volumeMap}} 
while IoTasker run failed.
c. Consider this is core changes, we should add enough unit test to cover 
changes.

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-18 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999776#comment-16999776
 ] 

Aiphago commented on HDFS-15000:


submit demo patch.The main idea is to make IO opreate(the opreate may have 
order depend) without lock and async and keep the opreate is in order.

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-11-21 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979616#comment-16979616
 ] 

Wei-Chiu Chuang commented on HDFS-15000:


HDFS-9668 might also help.

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-11-21 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979498#comment-16979498
 ] 

Wei-Chiu Chuang commented on HDFS-15000:


Great idea. It reminds me of HDFS-8496 where we forcefully stops DataNode 
threads if it holds the lock for too long.

It also reminds me of HDFS-11187 where it eliminates a lock held by block 
sender when it reads a block.

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org