[jira] [Commented] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371052#comment-16371052 ] genericqa commented on HDFS-13056: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 12m 25s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 12m 25s{color} | {color:red} root generated 1 new + 1231 unchanged - 0 fixed = 1232 total (was 1231) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 17s{color} | {color:orange} root: The patch generated 175 new + 609 unchanged - 1 fixed = 784 total (was 610) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 54s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 2s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 30s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}114m 18s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 51s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}221m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13056 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12911312/HDFS-13056.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite
[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-11187: - Resolution: Fixed Fix Version/s: 2.7.6 Status: Resolved (was: Patch Available) Pushed to branch-2.7! Thanks Gabor and Erik > Optimize disk access for last partial chunk checksum of Finalized replica > - > > Key: HDFS-11187 > URL: https://issues.apache.org/jira/browse/HDFS-11187 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2 > > Attachments: HDFS-11187-branch-2.001.patch, > HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, > HDFS-11187-branch-2.004.patch, HDFS-11187-branch-2.7.001.patch, > HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, > HDFS-11187.004.patch, HDFS-11187.005.patch > > > The patch at HDFS-11160 ensures BlockSender reads the correct version of > metafile when there are concurrent writers. > However, the implementation is not optimal, because it must always read the > last partial chunk checksum from disk while holding FsDatasetImpl lock for > every reader. It is possible to optimize this by keeping an up-to-date > version of last partial checksum in-memory and reduce disk access. > I am separating the optimization into a new jira, because maintaining the > state of in-memory checksum requires a lot more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371050#comment-16371050 ] Xiao Chen commented on HDFS-11187: -- Failed tests look unrelated. checkstyle and whitespace are related but trivial, I'll fix those at commit time. +1 on branch-2.7 patch, committing > Optimize disk access for last partial chunk checksum of Finalized replica > - > > Key: HDFS-11187 > URL: https://issues.apache.org/jira/browse/HDFS-11187 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2 > > Attachments: HDFS-11187-branch-2.001.patch, > HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, > HDFS-11187-branch-2.004.patch, HDFS-11187-branch-2.7.001.patch, > HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, > HDFS-11187.004.patch, HDFS-11187.005.patch > > > The patch at HDFS-11160 ensures BlockSender reads the correct version of > metafile when there are concurrent writers. > However, the implementation is not optimal, because it must always read the > last partial chunk checksum from disk while holding FsDatasetImpl lock for > every reader. It is possible to optimize this by keeping an up-to-date > version of last partial checksum in-memory and reduce disk access. > I am separating the optimization into a new jira, because maintaining the > state of in-memory checksum requires a lot more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
[ https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371043#comment-16371043 ] Elek, Marton commented on HDFS-13108: - The one checkstyle issue is also fixed... > Ozone: OzoneFileSystem: Simplified url schema for Ozone File System > --- > > Key: HDFS-13108 > URL: https://issues.apache.org/jira/browse/HDFS-13108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Attachments: HDFS-13108-HDFS-7240.001.patch, > HDFS-13108-HDFS-7240.002.patch, HDFS-13108-HDFS-7240.003.patch, > HDFS-13108-HDFS-7240.005.patch, HDFS-13108-HDFS-7240.006.patch, > HDFS-13108-HDFS-7240.007.patch > > > A. Current state > > 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. > o3://datanode:9864/test/bucket1) > 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the > keys from the bucket1) > It works very well, but there are some limitations. > B. Problem one > The current code doesn't support fully qualified locations. For example 'dfs > -ls o3://datanode:9864/test/bucket1/dir1' is not working. > C.) Problem two > I tried to fix the previous problem, but it's not trivial. The biggest > problem is that there is a Path.makeQualified call which could transform > unqualified url to qualified url. This is part of the Path.java so it's > common for all the Hadoop file systems. > In the current implementations it qualifies an url with keeping the schema > (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use > the relative path as the end of the qualified url. For example: > makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will > return o3://datanode:9864/dir1/file which is obviously wrong (the good would > be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround > with using a custom makeQualified in the Ozone code and it worked from > command line but couldn't work with Spark which use the Hadoop api and the > original makeQualified path. > D.) Solution > We should support makeQualified calls, so we can use any path in the > defaultFS. > > I propose to use a simplified schema as o3://bucket.volume/ > This is similar to the s3a format where the pattern is s3a://bucket.region/ > We don't need to set the hostname of the datanode (or ksm in case of service > discovery) but it would be configurable with additional hadoop configuraion > values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 > (this is how the s3a works today, as I know). > We also need to define restrictions for the volume names (in our case it > should not include dot any more). > ps: some spark output > 2018-02-03 18:43:04 WARN Client:66 - Neither spark.yarn.jars nor > spark.yarn.archive is set, falling back to uploading libraries under > SPARK_HOME. > 2018-02-03 18:43:05 INFO Client:54 - Uploading resource > file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip > -> > o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip > My default fs was o3://datanode:9864/test/bucket1, but spark qualified the > name of the home directory. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
[ https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton updated HDFS-13108: Attachment: HDFS-13108-HDFS-7240.007.patch > Ozone: OzoneFileSystem: Simplified url schema for Ozone File System > --- > > Key: HDFS-13108 > URL: https://issues.apache.org/jira/browse/HDFS-13108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Attachments: HDFS-13108-HDFS-7240.001.patch, > HDFS-13108-HDFS-7240.002.patch, HDFS-13108-HDFS-7240.003.patch, > HDFS-13108-HDFS-7240.005.patch, HDFS-13108-HDFS-7240.006.patch, > HDFS-13108-HDFS-7240.007.patch > > > A. Current state > > 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. > o3://datanode:9864/test/bucket1) > 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the > keys from the bucket1) > It works very well, but there are some limitations. > B. Problem one > The current code doesn't support fully qualified locations. For example 'dfs > -ls o3://datanode:9864/test/bucket1/dir1' is not working. > C.) Problem two > I tried to fix the previous problem, but it's not trivial. The biggest > problem is that there is a Path.makeQualified call which could transform > unqualified url to qualified url. This is part of the Path.java so it's > common for all the Hadoop file systems. > In the current implementations it qualifies an url with keeping the schema > (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use > the relative path as the end of the qualified url. For example: > makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will > return o3://datanode:9864/dir1/file which is obviously wrong (the good would > be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround > with using a custom makeQualified in the Ozone code and it worked from > command line but couldn't work with Spark which use the Hadoop api and the > original makeQualified path. > D.) Solution > We should support makeQualified calls, so we can use any path in the > defaultFS. > > I propose to use a simplified schema as o3://bucket.volume/ > This is similar to the s3a format where the pattern is s3a://bucket.region/ > We don't need to set the hostname of the datanode (or ksm in case of service > discovery) but it would be configurable with additional hadoop configuraion > values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 > (this is how the s3a works today, as I know). > We also need to define restrictions for the volume names (in our case it > should not include dot any more). > ps: some spark output > 2018-02-03 18:43:04 WARN Client:66 - Neither spark.yarn.jars nor > spark.yarn.archive is set, falling back to uploading libraries under > SPARK_HOME. > 2018-02-03 18:43:05 INFO Client:54 - Uploading resource > file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip > -> > o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip > My default fs was o3://datanode:9864/test/bucket1, but spark qualified the > name of the home directory. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR
[ https://issues.apache.org/jira/browse/HDFS-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371019#comment-16371019 ] Rakesh R commented on HDFS-13165: - Attached another patch fixing, - checkstyle - whitespace - {{TestStoragePolicySatisfier#estSPSWhenFileHasExcessRedundancyBlocks}} test failure > [SPS]: Collects successfully moved block details via IBR > > > Key: HDFS-13165 > URL: https://issues.apache.org/jira/browse/HDFS-13165 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-13165-HDFS-10285-00.patch, > HDFS-13165-HDFS-10285-01.patch, HDFS-13165-HDFS-10285-02.patch > > > This task to make use of the existing IBR to get moved block details and > remove unwanted future tracking logic exists in BlockStorageMovementTracker > code, this is no more needed as the file level tracking maintained at NN > itself. > Following comments taken from HDFS-10285, > [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16347472] > Comment-3) > {quote}BPServiceActor > Is it actually sending back the moved blocks? Aren’t IBRs sufficient?{quote} > Comment-21) > {quote} > BlockStorageMovementTracker > Many data structures are riddled with non-threadsafe race conditions and risk > of CMEs. > Ex. The moverTaskFutures map. Adding new blocks and/or adding to a block's > list of futures is synchronized. However the run loop does an unsynchronized > block get, unsynchronized future remove, unsynchronized isEmpty, possibly > another unsynchronized get, only then does it do a synchronized remove of the > block. The whole chunk of code should be synchronized. > Is the problematic moverTaskFutures even needed? It's aggregating futures > per-block for seemingly no reason. Why track all the futures at all instead > of just relying on the completion service? As best I can tell: > It's only used to determine if a future from the completion service should be > ignored during shutdown. Shutdown sets the running boolean to false and > clears the entire datastructure so why not use the running boolean like a > check just a little further down? > As synchronization to sleep up to 2 seconds before performing a blocking > moverCompletionService.take, but only when it thinks there are no active > futures. I'll ignore the missed notify race that the bounded wait masks, but > the real question is why not just do the blocking take? > Why all the complexity? Am I missing something? > BlocksMovementsStatusHandler > Suffers same type of thread safety issues as StoragePolicySatisfyWorker. Ex. > blockIdVsMovementStatus is inconsistent synchronized. Does synchronize to > return an unmodifiable list which sadly does nothing to protect the caller > from CME. > handle is iterating over a non-thread safe list. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR
[ https://issues.apache.org/jira/browse/HDFS-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-13165: Attachment: HDFS-13165-HDFS-10285-02.patch > [SPS]: Collects successfully moved block details via IBR > > > Key: HDFS-13165 > URL: https://issues.apache.org/jira/browse/HDFS-13165 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-13165-HDFS-10285-00.patch, > HDFS-13165-HDFS-10285-01.patch, HDFS-13165-HDFS-10285-02.patch > > > This task to make use of the existing IBR to get moved block details and > remove unwanted future tracking logic exists in BlockStorageMovementTracker > code, this is no more needed as the file level tracking maintained at NN > itself. > Following comments taken from HDFS-10285, > [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16347472] > Comment-3) > {quote}BPServiceActor > Is it actually sending back the moved blocks? Aren’t IBRs sufficient?{quote} > Comment-21) > {quote} > BlockStorageMovementTracker > Many data structures are riddled with non-threadsafe race conditions and risk > of CMEs. > Ex. The moverTaskFutures map. Adding new blocks and/or adding to a block's > list of futures is synchronized. However the run loop does an unsynchronized > block get, unsynchronized future remove, unsynchronized isEmpty, possibly > another unsynchronized get, only then does it do a synchronized remove of the > block. The whole chunk of code should be synchronized. > Is the problematic moverTaskFutures even needed? It's aggregating futures > per-block for seemingly no reason. Why track all the futures at all instead > of just relying on the completion service? As best I can tell: > It's only used to determine if a future from the completion service should be > ignored during shutdown. Shutdown sets the running boolean to false and > clears the entire datastructure so why not use the running boolean like a > check just a little further down? > As synchronization to sleep up to 2 seconds before performing a blocking > moverCompletionService.take, but only when it thinks there are no active > futures. I'll ignore the missed notify race that the bounded wait masks, but > the real question is why not just do the blocking take? > Why all the complexity? Am I missing something? > BlocksMovementsStatusHandler > Suffers same type of thread safety issues as StoragePolicySatisfyWorker. Ex. > blockIdVsMovementStatus is inconsistent synchronized. Does synchronize to > return an unmodifiable list which sadly does nothing to protect the caller > from CME. > handle is iterating over a non-thread safe list. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13040) Kerberized inotify client fails despite kinit properly
[ https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370999#comment-16370999 ] Xiao Chen commented on HDFS-13040: -- Thanks for the review Daryn. Patch 5 attached to address the comments except the 'current user' one. I agree it's the most correct thing to do, but maybe we can leave it out to a future jira. {quote} floating the doAs login user up to {{getEditsFromTxid}} {quote} Good idea, done this way and left the stream class untouched. {quote}Could the unit test just explicitly set the conf keys {quote} Not really, because the journal part of the QJMHA cluster needs to be started first for us to know the correct journal URI, so we can't know the uri beforehand. {{initHAConf}} currently sets the shared edits dir key, presumably for the same reason. {quote}the test {quote} Good catch, and helpful explanations. Addressed by using the correct UGIs. hdfs@ is the client, and hdfs/localhost@ is the NN user. Verified I can see the big beautiful gssapi stack trace without the fix. 1 odd thing I found in the test though is I had to set the proxy users for it to work, otherwise the mkdirs after the relogin would throw {quote}AuthorizationException): User: hdfs/localh...@example.com is not allowed to impersonate h...@example.com {quote} at me. Debugging this, it seems to be a designed rpc server auth behavior from this [code|https://github.com/apache/hadoop/blob/121e1e1280c7b019f6d2cc3ba9eae1ead0dd8408/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java#L2260]. Though my debugging shows the {{protocolUser}} is {{hdfs@ (auth:SIMPLE)}}, while the {{realUser}} is {{hdfs/localhost@ (auth:KERBEROS)}}, still weird > Kerberized inotify client fails despite kinit properly > -- > > Key: HDFS-13040 > URL: https://issues.apache.org/jira/browse/HDFS-13040 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: Kerberized, HA cluster, iNotify client, CDH5.10.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HDFS-13040.001.patch, HDFS-13040.02.patch, > HDFS-13040.03.patch, HDFS-13040.04.patch, HDFS-13040.05.patch, > HDFS-13040.half.test.patch, TestDFSInotifyEventInputStreamKerberized.java, > TransactionReader.java > > > This issue is similar to HDFS-10799. > HDFS-10799 turned out to be a client side issue where client is responsible > for renewing kerberos ticket actively. > However we found in a slightly setup even if client has valid Kerberos > credentials, inotify still fails. > Suppose client uses principal h...@example.com, > namenode 1 uses server principal hdfs/nn1.example@example.com > namenode 2 uses server principal hdfs/nn2.example@example.com > *After Namenodes starts for longer than kerberos ticket lifetime*, the client > fails with the following error: > {noformat} > 18/01/19 11:23:02 WARN security.UserGroupInformation: > PriviledgedActionException as:h...@gce.cloudera.com (auth:KERBEROS) > cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): We > encountered an error reading > https://nn2.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3, > > https://nn1.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3. > During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 8683, but we thought we could read up to transaction > 8684. If you continue, metadata will be lost forever! > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1701) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1763) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1011) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1490) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) >
[jira] [Updated] (HDFS-13040) Kerberized inotify client fails despite kinit properly
[ https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-13040: - Attachment: HDFS-13040.05.patch > Kerberized inotify client fails despite kinit properly > -- > > Key: HDFS-13040 > URL: https://issues.apache.org/jira/browse/HDFS-13040 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: Kerberized, HA cluster, iNotify client, CDH5.10.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HDFS-13040.001.patch, HDFS-13040.02.patch, > HDFS-13040.03.patch, HDFS-13040.04.patch, HDFS-13040.05.patch, > HDFS-13040.half.test.patch, TestDFSInotifyEventInputStreamKerberized.java, > TransactionReader.java > > > This issue is similar to HDFS-10799. > HDFS-10799 turned out to be a client side issue where client is responsible > for renewing kerberos ticket actively. > However we found in a slightly setup even if client has valid Kerberos > credentials, inotify still fails. > Suppose client uses principal h...@example.com, > namenode 1 uses server principal hdfs/nn1.example@example.com > namenode 2 uses server principal hdfs/nn2.example@example.com > *After Namenodes starts for longer than kerberos ticket lifetime*, the client > fails with the following error: > {noformat} > 18/01/19 11:23:02 WARN security.UserGroupInformation: > PriviledgedActionException as:h...@gce.cloudera.com (auth:KERBEROS) > cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): We > encountered an error reading > https://nn2.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3, > > https://nn1.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3. > During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 8683, but we thought we could read up to transaction > 8684. If you continue, metadata will be lost forever! > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1701) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1763) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1011) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1490) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210) > {noformat} > Typically if NameNode has an expired Kerberos ticket, the error handling for > the typical edit log tailing would let NameNode to relogin with its own > Kerberos principal. However, when inotify uses the same code path to retrieve > edits, since the current user is the inotify client's principal, unless > client uses the same principal as the NameNode, NameNode can't do it on > behalf of the client. > Therefore, a more appropriate approach is to use proxy user so that NameNode > can retrieving edits on behalf of the client. > I will attach a patch to fix it. This patch has been verified to work for a > CDH5.10.2 cluster, however it seems impossible to craft a unit test for this > fix because the way Hadoop UGI handles Kerberos credentials (I can't have a > single process that logins as two Kerberos principals simultaneously and let > them establish connection) > A possible workaround is for the inotify client to use the active NameNode's > server principal. However, that's not going to work when there's a namenode > failover, because then the client's principal will not be
[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370929#comment-16370929 ] Hudson commented on HDFS-13175: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13692 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13692/]) HDFS-13175. Add more information for checking argument in (aengineer: rev 121e1e1280c7b019f6d2cc3ba9eae1ead0dd8408) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/diskbalancer/connectors/DBNameNodeConnector.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/diskbalancer/datamodel/DiskBalancerVolume.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/diskbalancer/command/PlanCommand.java > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.1.0, 3.0.2 > > Attachments: HDFS-13175.00.patch, HDFS-13175.01.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-13175: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.2 3.1.0 Status: Resolved (was: Patch Available) [~eddyxu] thank you for the contribution. I have committed this to 3.0,3.1 and trunk. > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.1.0, 3.0.2 > > Attachments: HDFS-13175.00.patch, HDFS-13175.01.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Huo updated HDFS-13056: -- Status: Patch Available (was: Open) > Expose file-level composite CRCs in HDFS which are comparable across > different instances/layouts > > > Key: HDFS-13056 > URL: https://issues.apache.org/jira/browse/HDFS-13056 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, distcp, erasure-coding, federation, hdfs >Affects Versions: 3.0.0 >Reporter: Dennis Huo >Priority: Major > Attachments: HDFS-13056-branch-2.8.001.patch, > HDFS-13056-branch-2.8.poc1.patch, HDFS-13056.001.patch, HDFS-13056.002.patch, > Reference_only_zhen_PPOC_hadoop2.6.X.diff, hdfs-file-composite-crc32-v1.pdf, > hdfs-file-composite-crc32-v2.pdf, hdfs-file-composite-crc32-v3.pdf > > > FileChecksum was first introduced in > [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then > has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are > already stored as part of datanode metadata, and the MD5 approach is used to > compute an aggregate value in a distributed manner, with individual datanodes > computing the MD5-of-CRCs per-block in parallel, and the HDFS client > computing the second-level MD5. > > A shortcoming of this approach which is often brought up is the fact that > this FileChecksum is sensitive to the internal block-size and chunk-size > configuration, and thus different HDFS files with different block/chunk > settings cannot be compared. More commonly, one might have different HDFS > clusters which use different block sizes, in which case any data migration > won't be able to use the FileChecksum for distcp's rsync functionality or for > verifying end-to-end data integrity (on top of low-level data integrity > checks applied at data transfer time). > > This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 > during the addition of checksum support for striped erasure-coded files; > while there was some discussion of using CRC composability, it still > ultimately settled on hierarchical MD5 approach, which also adds the problem > that checksums of basic replicated files are not comparable to striped files. > > This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses > CRC composition to remain completely chunk/block agnostic, and allows > comparison between striped vs replicated files, between different HDFS > instances, and possible even between HDFS and other external storage systems. > This feature can also be added in-place to be compatible with existing block > metadata, and doesn't need to change the normal path of chunk verification, > so is minimally invasive. This also means even large preexisting HDFS > deployments could adopt this feature to retroactively sync data. A detailed > design document can be found here: > https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370896#comment-16370896 ] Anu Engineer commented on HDFS-13175: - I will commit this shortly. > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-13175.00.patch, HDFS-13175.01.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370886#comment-16370886 ] genericqa commented on HDFS-13175: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 31s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}178m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13175 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12911296/HDFS-13175.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f58d0d8187b2 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6f81cc0 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/23136/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23136/testReport/ | | Max. process+thread count |
[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370856#comment-16370856 ] Anu Engineer commented on HDFS-13175: - +1, patch v1, pending jenkins. > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-13175.00.patch, HDFS-13175.01.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Huo updated HDFS-13056: -- Attachment: HDFS-13056.002.patch > Expose file-level composite CRCs in HDFS which are comparable across > different instances/layouts > > > Key: HDFS-13056 > URL: https://issues.apache.org/jira/browse/HDFS-13056 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, distcp, erasure-coding, federation, hdfs >Affects Versions: 3.0.0 >Reporter: Dennis Huo >Priority: Major > Attachments: HDFS-13056-branch-2.8.001.patch, > HDFS-13056-branch-2.8.poc1.patch, HDFS-13056.001.patch, HDFS-13056.002.patch, > Reference_only_zhen_PPOC_hadoop2.6.X.diff, hdfs-file-composite-crc32-v1.pdf, > hdfs-file-composite-crc32-v2.pdf, hdfs-file-composite-crc32-v3.pdf > > > FileChecksum was first introduced in > [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then > has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are > already stored as part of datanode metadata, and the MD5 approach is used to > compute an aggregate value in a distributed manner, with individual datanodes > computing the MD5-of-CRCs per-block in parallel, and the HDFS client > computing the second-level MD5. > > A shortcoming of this approach which is often brought up is the fact that > this FileChecksum is sensitive to the internal block-size and chunk-size > configuration, and thus different HDFS files with different block/chunk > settings cannot be compared. More commonly, one might have different HDFS > clusters which use different block sizes, in which case any data migration > won't be able to use the FileChecksum for distcp's rsync functionality or for > verifying end-to-end data integrity (on top of low-level data integrity > checks applied at data transfer time). > > This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 > during the addition of checksum support for striped erasure-coded files; > while there was some discussion of using CRC composability, it still > ultimately settled on hierarchical MD5 approach, which also adds the problem > that checksums of basic replicated files are not comparable to striped files. > > This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses > CRC composition to remain completely chunk/block agnostic, and allows > comparison between striped vs replicated files, between different HDFS > instances, and possible even between HDFS and other external storage systems. > This feature can also be added in-place to be compatible with existing block > metadata, and doesn't need to change the normal path of chunk verification, > so is minimally invasive. This also means even large preexisting HDFS > deployments could adopt this feature to retroactively sync data. A detailed > design document can be found here: > https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370820#comment-16370820 ] ASF GitHub Bot commented on HDFS-13056: --- GitHub user dennishuo opened a pull request: https://github.com/apache/hadoop/pull/344 HDFS-13056. Add support for a new COMPOSITE_CRC FileChecksum which is comparable between different block layouts and between striped/replicated files You can merge this pull request into a Git repository by running: $ git pull https://github.com/dennishuo/hadoop add-composite-crc32 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/344.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #344 commit de06097fa2f4c511d5a107d997c7dfa5862ada82 Author: Dennis HuoDate: 2018-01-24T23:04:29Z Add support for a new COMPOSITE_CRC FileChecksum. Adds new file-level ChecksumCombineMode options settable through config and lower-level BlockChecksumOptions to indicate block-checksum types supported by both blockChecksum and blockGroupChecksum in DataTransferProtocol. CRCs are composed such that they are agnostic to block/chunk/cell layout and thus can be compared between replicated-files and striped-files of different underlying blocksize, bytes-per-crc, and cellSize settings. Does not alter default behavior, and doesn't touch the data-read or data-write paths at all. commit 3f8fd5ef9da8c312f60430622d3c95f80cb1fde2 Author: Dennis Huo Date: 2018-02-08T00:21:14Z Fix byte-length property for CRC FileChecksum commit 1a326e38505bacd6b40a682668f36c2aa1047f86 Author: Dennis Huo Date: 2018-02-19T02:53:03Z Add unittest for CrcUtil. Minor optimization by starting multiplier at x^8 and fix the behavior of composing a zero-length crcB. commit d7c2bc739f3cff0d8ae72bb4f2a940eb5b733279 Author: Dennis Huo Date: 2018-02-20T00:47:50Z Refactor StripedBlockChecksumReconstructor for easier reuse with COMPOSITE_CRC. Update BlockChecksumHelper's CRC composition to use the same data buffer used in MD5 case, and factor our shared logic from the StripedBlockChecksumReconstructor into an abstract base class so that reconstruction logic can be shared between MD5CRC and COMPOSITE_CRC. commit ac38f404f1d15c9846f58acf297c7e242c3f8bce Author: Dennis Huo Date: 2018-02-20T03:05:41Z Extract a helper class CrcComposer. Encapsulate all the CRC internals such as tracking the CRC polynomial, precomputing the monomial, etc., into this class so taht BlockChecksumHelper and FileChecksumHelper only need to interact with the clean interfaces of CrcComposer. commit 8f7b9fd6f93c8358dd0c4899e41d2a993bcc6294 Author: Dennis Huo Date: 2018-02-20T03:40:33Z Add StripedBlockChecksumCompositeCrcReconstructor. Wire it in to BlockChecksumHelper and use CrcComposer to regenerate striped composite CRCs for missing EC data blocks. commit fd2fc3408346aeb177eaeda50919995ee3c02cab Author: Dennis Huo Date: 2018-02-20T21:56:07Z Add end-to-end test coverage for COMPOSITE_CRC. Extract hooks in TestFileChecksum to allow a subclass to share core tests while modifying expectations of a subset of tests; add TestFileChecksumCompositeCrc which extends TestFileChecksum to apply the same test suite to COMPOSITE_CRC, and add a test case for comparing two replicated files with different block sizes. Test confirms that MD5CRC will yield different checksums between replicated vs striped, and two replicated files with different block sizes, while COMPOSITE_CRC yields the same checksum for all cases. commit 5cd2d08f2be672e79d931ebb6f89541f38334f0b Author: Dennis Huo Date: 2018-02-20T23:44:11Z Add unittest for CrcComposer. Fix a bug in handling byte-array updates with nonzero offset. commit e65248b077d4e1ad00888112de877afed86dad03 Author: Dennis Huo Date: 2018-02-21T00:08:05Z Remove STRIPED_CRC as a BlockChecksumType. Refactor to just use stripeLength with COMPOSITE_CRC, where non-striped COMPOSITE_CRC is just an edge case where stripeLength is longer than the data range. commit c2a7701246c07a4906d7540d6bc496364239dafc Author: Dennis Huo Date: 2018-02-21T01:02:08Z Support file-attribute propagation of bytePerCrc in CompositeCrcFileChecksum. Additionally, fix up remaining TODOs; add wrappers for late-evaluating hex format of CRCs to pass into debug statements and clean up logging logic. > Expose file-level composite CRCs in HDFS which are comparable across > different instances/layouts >
[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370819#comment-16370819 ] genericqa commented on HDFS-13175: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 29s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}122m 34s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}177m 5s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDecommission | | | hadoop.hdfs.TestFileChecksum | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 | | | hadoop.hdfs.TestErasureCodingPolicyWithSnapshotWithRandomECPolicy | | | hadoop.hdfs.TestReplication | | | hadoop.hdfs.TestHFlush | | | hadoop.hdfs.server.namenode.TestReencryptionWithKMS | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.qjournal.server.TestJournalNodeSync | | | hadoop.hdfs.TestDFSStripedOutputStream | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure170 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure050 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.TestSetrepIncreasing | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160 | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13175 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12911282/HDFS-13175.00.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite
[jira] [Commented] (HDFS-13167) DatanodeAdminManager Improvements
[ https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370769#comment-16370769 ] Hudson commented on HDFS-13167: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13688 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13688/]) HDFS-13167. DatanodeAdminManager Improvements. Contributed by BELUGA (inigoiri: rev 6f81cc0beea00843b44424417f09d8ee12cd7bae) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java > DatanodeAdminManager Improvements > - > > Key: HDFS-13167 > URL: https://issues.apache.org/jira/browse/HDFS-13167 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Fix For: 3.2.0 > > Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch, > HDFS-13167.3.patch > > > # Use Collection type Set instead of List for tracking nodes > # Fix logging statements that are erroneously appending variables instead of > using parameters > # Miscellaneous small improvements > As an example, the {{node}} variable is being appended to the string instead > of being passed as an argument to the {{trace}} method for variable > substitution. > {code} > LOG.trace("stopDecommission: Node {} in {}, nothing to do." + > node, node.getAdminState()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370765#comment-16370765 ] Lei (Eddy) Xu commented on HDFS-13175: -- [~anu] is this writing-before-stream trying to write {{clusterInfo}} obtained from {{readClusterInfo(cmd); }}. The exception above is from {{readClusterInfo()}} so that it could not be able to write this {{beforeStream}} in this particular case. I moved that block of code before {{computePlan}} in the 01 patch. > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-13175.00.patch, HDFS-13175.01.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-13175: - Attachment: HDFS-13175.01.patch > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-13175.00.patch, HDFS-13175.01.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList
[ https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370749#comment-16370749 ] Hudson commented on HDFS-13168: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13687 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13687/]) HDFS-13168. XmlImageVisitor - Prefer Array over LinkedList. Contributed (inigoiri: rev 17c592e6cfd1ea3dbe9671c4703caabd095d87cf) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/XmlImageVisitor.java > XmlImageVisitor - Prefer Array over LinkedList > -- > > Key: HDFS-13168 > URL: https://issues.apache.org/jira/browse/HDFS-13168 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Fix For: 3.2.0 > > Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch > > > {{ArrayDeque}} > {quote}This class is likely to be faster than Stack when used as a stack, and > faster than LinkedList when used as a queue.{quote} > .. not to mention less memory fragmentation (single backing array v.s. many > ArrayList nodes). > https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13109) Support fully qualified hdfs path in EZ commands
[ https://issues.apache.org/jira/browse/HDFS-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370741#comment-16370741 ] genericqa commented on HDFS-13109: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 16s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 22s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}118m 46s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}179m 17s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13109 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12911273/HDFS-13109.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 720859b6050a 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9028cca | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit |
[jira] [Updated] (HDFS-13167) DatanodeAdminManager Improvements
[ https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13167: --- Resolution: Fixed Fix Version/s: 3.2.0 Status: Resolved (was: Patch Available) > DatanodeAdminManager Improvements > - > > Key: HDFS-13167 > URL: https://issues.apache.org/jira/browse/HDFS-13167 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Fix For: 3.2.0 > > Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch, > HDFS-13167.3.patch > > > # Use Collection type Set instead of List for tracking nodes > # Fix logging statements that are erroneously appending variables instead of > using parameters > # Miscellaneous small improvements > As an example, the {{node}} variable is being appended to the string instead > of being passed as an argument to the {{trace}} method for variable > substitution. > {code} > LOG.trace("stopDecommission: Node {} in {}, nothing to do." + > node, node.getAdminState()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList
[ https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13168: --- Resolution: Fixed Fix Version/s: 3.2.0 Status: Resolved (was: Patch Available) > XmlImageVisitor - Prefer Array over LinkedList > -- > > Key: HDFS-13168 > URL: https://issues.apache.org/jira/browse/HDFS-13168 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Fix For: 3.2.0 > > Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch > > > {{ArrayDeque}} > {quote}This class is likely to be faster than Stack when used as a stack, and > faster than LinkedList when used as a queue.{quote} > .. not to mention less memory fragmentation (single backing array v.s. many > ArrayList nodes). > https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList
[ https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370732#comment-16370732 ] Íñigo Goiri commented on HDFS-13168: Thanks [~belugabehr] for the patch, committed to trunk. > XmlImageVisitor - Prefer Array over LinkedList > -- > > Key: HDFS-13168 > URL: https://issues.apache.org/jira/browse/HDFS-13168 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Fix For: 3.2.0 > > Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch > > > {{ArrayDeque}} > {quote}This class is likely to be faster than Stack when used as a stack, and > faster than LinkedList when used as a queue.{quote} > .. not to mention less memory fragmentation (single backing array v.s. many > ArrayList nodes). > https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370720#comment-16370720 ] Lei (Eddy) Xu commented on HDFS-13175: -- Thanks a lot for the information, [~anu]. Let me go back to check and get back to you. > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-13175.00.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList
[ https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370713#comment-16370713 ] BELUGA BEHR commented on HDFS-13168: + is turned into StringBuilder under the covers. No preference, but I believe it's been all caps thus far. Thanks!!! > XmlImageVisitor - Prefer Array over LinkedList > -- > > Key: HDFS-13168 > URL: https://issues.apache.org/jira/browse/HDFS-13168 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch > > > {{ArrayDeque}} > {quote}This class is likely to be faster than Stack when used as a stack, and > faster than LinkedList when used as a queue.{quote} > .. not to mention less memory fragmentation (single backing array v.s. many > ArrayList nodes). > https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370697#comment-16370697 ] Anu Engineer commented on HDFS-13175: - [~eddyxu] Just checked code, we need to move this block, after we call {{readClusterInfo(cmd);}} {code:java} try (FSDataOutputStream beforeStream = create(String.format( DiskBalancerCLI.BEFORE_TEMPLATE, cmd.getOptionValue(DiskBalancerCLI.PLAN { beforeStream.write(getCluster().toJson() .getBytes(StandardCharsets.UTF_8)); }{code} but before we call {{computePlan}}, that way we will write always write {{before.json}} {{List plans = getCluster().computePlan(this.thresholdPercentage);}} > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-13175.00.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370665#comment-16370665 ] Anu Engineer edited comment on HDFS-13175 at 2/20/18 10:29 PM: --- [~eddyxu] , We capture a file called datanode.before.json, that file contains the whole Datanode reports that we read from the Namenode connector. The default path is {{"/system/diskbalancer/./before.json}}, please see if you have that file, if so we will be able to reproduce this issue. It is possible that we crashed before we wrote this file, if so may be we should save the data before we process it. was (Author: anu): [~eddyxu] , We capture a file called datanode.before.json, that file contains the whole Datanode reports that we read from the Namenode connector. The default path is {{"/system/diskbalancer/./before.json}}, please see if you have that file, if so we will be able to reproduce this issue. It is possible that we crashed before we wrote this file, if so may be we should have the data before we process it. > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-13175.00.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370665#comment-16370665 ] Anu Engineer edited comment on HDFS-13175 at 2/20/18 10:28 PM: --- [~eddyxu] , We capture a file called datanode.before.json, that file contains the whole Datanode reports that we read from the Namenode connector. The default path is {{"/system/diskbalancer/./before.json}}, please see if you have that file, if so we will be able to reproduce this issue. It is possible that we crashed before we wrote this file, if so may be we should have the data before we process it. was (Author: anu): [~eddyxu] , We capture a file called datanode.before.json, that file contains the whole Datanode reports that we read from the Namenode connector. The default path is {{"/system/diskbalancer/./before.json}}, please see if you have that file, if so we will be able to reproduce this issue. > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-13175.00.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370665#comment-16370665 ] Anu Engineer commented on HDFS-13175: - [~eddyxu] , We capture a file called datanode.before.json, that file contains the whole Datanode reports that we read from the Namenode connector. The default path is {{"/system/diskbalancer/./before.json}}, please see if you have that file, if so we will be able to reproduce this issue. > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-13175.00.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370655#comment-16370655 ] Anu Engineer commented on HDFS-13175: - +1, Pending Jenkins.Thanks for filing and fixing this issue. > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-13175.00.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370650#comment-16370650 ] Lei (Eddy) Xu commented on HDFS-13175: -- The patch also deleted a duplicated line of {{volume.setUsed(report.getDfsUsed());}}. > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-13175.00.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-13175: - Status: Patch Available (was: Open) > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-13175.00.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13161) Update comment in start-dfs.sh to mention correct variable for secure datanode user
[ https://issues.apache.org/jira/browse/HDFS-13161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370649#comment-16370649 ] Ajay Kumar commented on HDFS-13161: --- [~vagarychen] Thanks for review and commit. > Update comment in start-dfs.sh to mention correct variable for secure > datanode user > > > Key: HDFS-13161 > URL: https://issues.apache.org/jira/browse/HDFS-13161 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Minor > Attachments: HDFS-13161.000.patch > > > start-dfs.sh mentions that for secure DN startup we need to set > HADOOP_SECURE_DN_USER. > The correct variable is HDFS_DATANODE_SECURE_USER. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-13175: - Attachment: HDFS-13175.00.patch > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-13175.00.patch > > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
[ https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370638#comment-16370638 ] genericqa commented on HDFS-13108: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 34s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 56s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 2s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 20s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 4s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 9s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 14s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s{color} | {color:green} HDFS-7240 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 12s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 20s{color} | {color:orange} root: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 40s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}109m 43s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 29s{color} | {color:green} hadoop-ozone in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}215m 8s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.server.namenode.TestTruncateQuotaUpdate | | | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | | hadoop.hdfs.server.namenode.TestCheckpoint | | | hadoop.hdfs.server.namenode.TestNameEditsConfigs | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b | | JIRA Issue | HDFS-13108 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12911263/HDFS-13108-HDFS-7240.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2da01179cea9 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64
[jira] [Updated] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
[ https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-13175: - Description: We have seen the following stack in production {code} Exception in thread "main" java.lang.IllegalArgumentException at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) at org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) at org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) at org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) at org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) at org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) at org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) {code} raised from {code} public void setUsed(long dfsUsedSpace) { Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); this.used = dfsUsedSpace; } {code} However, the datanode reports at the very moment were not captured. We should add more information into the stack trace to better diagnose the issue. was: We have seen the following stack in production {code Exception in thread "main" java.lang.IllegalArgumentException at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) at org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) at org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) at org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) at org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) at org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) at org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) {code} raised from {code} public void setUsed(long dfsUsedSpace) { Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); this.used = dfsUsedSpace; } {code} However, the datanode reports at the very moment were not captured. We should add more information into the stack trace to better diagnose the issue. > Add more information for checking argument in DiskBalancerVolume > > > Key: HDFS-13175 > URL: https://issues.apache.org/jira/browse/HDFS-13175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: diskbalancer >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > > We have seen the following stack in production > {code} > Exception in thread "main" java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) > at > org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) > at > org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) > at > org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) > {code} > raised from > {code} > public void setUsed(long dfsUsedSpace) { > Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); > this.used = dfsUsedSpace; > } > {code} > However, the datanode reports at the very moment were not captured. We should > add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume
Lei (Eddy) Xu created HDFS-13175: Summary: Add more information for checking argument in DiskBalancerVolume Key: HDFS-13175 URL: https://issues.apache.org/jira/browse/HDFS-13175 Project: Hadoop HDFS Issue Type: Improvement Components: diskbalancer Affects Versions: 3.0.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu We have seen the following stack in production {code Exception in thread "main" java.lang.IllegalArgumentException at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) at org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268) at org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141) at org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90) at org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132) at org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123) at org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107) {code} raised from {code} public void setUsed(long dfsUsedSpace) { Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); this.used = dfsUsedSpace; } {code} However, the datanode reports at the very moment were not captured. We should add more information into the stack trace to better diagnose the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters
[ https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370621#comment-16370621 ] Lei (Eddy) Xu commented on HDFS-13119: -- Hi, [~elgoiri] This looks not like a blocker for 3.0.1 to me. Lets make it 3.0.2 then. > RBF: Manage unavailable clusters > > > Key: HDFS-13119 > URL: https://issues.apache.org/jira/browse/HDFS-13119 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Íñigo Goiri >Assignee: Yiqun Lin >Priority: Major > Labels: RBF > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2, 3.2.0 > > Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, > HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch, > HDFS-13119.006.patch > > > When a federated cluster has one of the subcluster down, operations that run > in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC > connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370607#comment-16370607 ] genericqa commented on HDFS-11187: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 2s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-2.7 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 10s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 14s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 3s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s{color} | {color:green} branch-2.7 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 23s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 172 unchanged - 0 fixed = 173 total (was 172) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 61 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}108m 38s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}151m 3s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Unreaped Processes | hadoop-hdfs:28 | | Failed junit tests | hadoop.hdfs.server.datanode.TestFsDatasetCache | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | Timed out junit tests | org.apache.hadoop.hdfs.TestWriteRead | | | org.apache.hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage | | | org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool | | | org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | org.apache.hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport | | | org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade | | | org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery | | | org.apache.hadoop.hdfs.TestPread | | | org.apache.hadoop.hdfs.TestFileAppend4 | | | org.apache.hadoop.hdfs.TestRollingUpgradeDowngrade | | | org.apache.hadoop.hdfs.server.datanode.TestBatchIbr | | | org.apache.hadoop.hdfs.TestDecommission | | | org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestInterDatanodeProtocol | | | org.apache.hadoop.hdfs.TestDFSUpgrade | | | org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles | | | org.apache.hadoop.hdfs.server.namenode.TestCheckpoint | | |
[jira] [Updated] (HDFS-13119) RBF: Manage unavailable clusters
[ https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13119: --- Fix Version/s: 3.0.2 2.9.1 > RBF: Manage unavailable clusters > > > Key: HDFS-13119 > URL: https://issues.apache.org/jira/browse/HDFS-13119 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Íñigo Goiri >Assignee: Yiqun Lin >Priority: Major > Labels: RBF > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2, 3.2.0 > > Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, > HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch, > HDFS-13119.006.patch > > > When a federated cluster has one of the subcluster down, operations that run > in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC > connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR
[ https://issues.apache.org/jira/browse/HDFS-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370604#comment-16370604 ] genericqa commented on HDFS-13165: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 17 new or modified test files. {color} | || || || || {color:brown} HDFS-10285 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 40s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 14s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} HDFS-10285 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 50s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 26 new + 887 unchanged - 4 fixed = 913 total (was 891) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 22s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}156m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestPersistentStoragePolicySatisfier | | | hadoop.hdfs.TestFileChecksum | | | hadoop.hdfs.server.namenode.TestTruncateQuotaUpdate | | | hadoop.hdfs.server.namenode.TestReencryptionWithKMS | | | hadoop.hdfs.server.namenode.sps.TestStoragePolicySatisfier | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13165 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12911264/HDFS-13165-HDFS-10285-01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname | Linux 539e3ea12cc8 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality
[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters
[ https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370606#comment-16370606 ] Íñigo Goiri commented on HDFS-13119: Thanks [~chris.douglas] for the clarification. I pushed to {{branch-2.9}} and {{branch-3.0}} and added 2.9.1 and 3.0.2 as fix versions. > RBF: Manage unavailable clusters > > > Key: HDFS-13119 > URL: https://issues.apache.org/jira/browse/HDFS-13119 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Íñigo Goiri >Assignee: Yiqun Lin >Priority: Major > Labels: RBF > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2, 3.2.0 > > Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, > HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch, > HDFS-13119.006.patch > > > When a federated cluster has one of the subcluster down, operations that run > in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC > connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13167) DatanodeAdminManager Improvements
[ https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370591#comment-16370591 ] Íñigo Goiri commented on HDFS-13167: [^HDFS-13167.3.patch] LGTM. Committing to trunk. > DatanodeAdminManager Improvements > - > > Key: HDFS-13167 > URL: https://issues.apache.org/jira/browse/HDFS-13167 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch, > HDFS-13167.3.patch > > > # Use Collection type Set instead of List for tracking nodes > # Fix logging statements that are erroneously appending variables instead of > using parameters > # Miscellaneous small improvements > As an example, the {{node}} variable is being appended to the string instead > of being passed as an argument to the {{trace}} method for variable > substitution. > {code} > LOG.trace("stopDecommission: Node {} in {}, nothing to do." + > node, node.getAdminState()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13109) Support fully qualified hdfs path in EZ commands
[ https://issues.apache.org/jira/browse/HDFS-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370572#comment-16370572 ] Hanisha Koneru commented on HDFS-13109: --- Thanks for reviewing the patch, [~shahrs87]. {quote}The variable {{dfs}} in {{HdfsAdmin}} referred to {{DistributedFileSystem}} whereas in {{DistributedFileSystem}}, {{dfs}} refers to {{DFSClient}}.{{DistributedFileSystem#getEZForPath}} resolves the path and calls {{DFSClient#getEZForPath}}. Whereas after the patch, it won't resolve the path. {quote} I did not resolve the path as the input path to {{private void provisionEZTrash}} is already resolved by the calling method. So we can skip {{DistributedFileSystem#getEZForPath}} and directly call {{DFSClient#getEZForPath}}. Please correct me if I understood wrongly. {quote}You have already resolved the path in the calling function public void provisionEZTrash. You can just pass the resolved path to the private method provisionEZTrash instead of getPathName. {quote} Yes, thanks for catching this. We can skip the {{getPathName}} and directly pass the resolved path component. Addressed other review comments and checkstyle issues in patch v03. > Support fully qualified hdfs path in EZ commands > > > Key: HDFS-13109 > URL: https://issues.apache.org/jira/browse/HDFS-13109 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13109.001.patch, HDFS-13109.002.patch, > HDFS-13109.003.patch > > > When creating an Encryption Zone, if the fully qualified path is specified in > the path argument, it throws the following error. > {code:java} > ~$ hdfs crypto -createZone -keyName mykey1 -path hdfs://ns1/zone1 > IllegalArgumentException: hdfs://ns1/zone1 is not the root of an encryption > zone. Do you mean /zone1? > ~$ hdfs crypto -createZone -keyName mykey1 -path "hdfs://namenode:9000/zone2" > IllegalArgumentException: hdfs://namenode:9000/zone2 is not the root of an > encryption zone. Do you mean /zone2? > {code} > The EZ creation succeeds as the path is resolved in > DFS#createEncryptionZone(). But while creating the Trash directory, the path > is not resolved and it throws the above error. > A fully qualified path should be supported by {{crypto}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13109) Support fully qualified hdfs path in EZ commands
[ https://issues.apache.org/jira/browse/HDFS-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13109: -- Attachment: HDFS-13109.003.patch > Support fully qualified hdfs path in EZ commands > > > Key: HDFS-13109 > URL: https://issues.apache.org/jira/browse/HDFS-13109 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13109.001.patch, HDFS-13109.002.patch, > HDFS-13109.003.patch > > > When creating an Encryption Zone, if the fully qualified path is specified in > the path argument, it throws the following error. > {code:java} > ~$ hdfs crypto -createZone -keyName mykey1 -path hdfs://ns1/zone1 > IllegalArgumentException: hdfs://ns1/zone1 is not the root of an encryption > zone. Do you mean /zone1? > ~$ hdfs crypto -createZone -keyName mykey1 -path "hdfs://namenode:9000/zone2" > IllegalArgumentException: hdfs://namenode:9000/zone2 is not the root of an > encryption zone. Do you mean /zone2? > {code} > The EZ creation succeeds as the path is resolved in > DFS#createEncryptionZone(). But while creating the Trash directory, the path > is not resolved and it throws the above error. > A fully qualified path should be supported by {{crypto}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13167) DatanodeAdminManager Improvements
[ https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370553#comment-16370553 ] genericqa commented on HDFS-13167: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 6s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 16s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}177m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestTruncateQuotaUpdate | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13167 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12911255/HDFS-13167.3.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 620e2dca651b 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8896d20 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/23129/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results |
[jira] [Commented] (HDFS-13159) TestTruncateQuotaUpdate fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370537#comment-16370537 ] Hudson commented on HDFS-13159: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13686 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13686/]) HDFS-13159. TestTruncateQuotaUpdate fails in trunk. Contributed by Nanda (arp: rev 9028ccaf838621808e5e26a9fa933d28799538dd) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestTruncateQuotaUpdate.java > TestTruncateQuotaUpdate fails in trunk > -- > > Key: HDFS-13159 > URL: https://issues.apache.org/jira/browse/HDFS-13159 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Arpit Agarwal >Assignee: Nanda kumar >Priority: Major > Fix For: 3.2.0 > > Attachments: HDFS-13159.000.patch, HDFS-13159.001.patch > > > Details in comment below. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters
[ https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370517#comment-16370517 ] Chris Douglas commented on HDFS-13119: -- bq. As this is technically a bug, I'd like to push it for 2.9.1 and 3.0.1 (or 3.0.2). There's a vote for 3.0.1 in progress, but you can contact the release manager ([~eddyxu]) in case he rolls another RC. bq. Any idea what's the current state with the branches? My guess is branch-2.9 and branch-3.0. AFAIK: trunk -> 3.2 3.1.0 -> branch-3.1 3.0.2 -> branch-3.0 3.0.1 -> branch-3.0.1 2.10 -> branch-2 2.9.1 -> branch-2.9 > RBF: Manage unavailable clusters > > > Key: HDFS-13119 > URL: https://issues.apache.org/jira/browse/HDFS-13119 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Íñigo Goiri >Assignee: Yiqun Lin >Priority: Major > Labels: RBF > Fix For: 3.1.0, 2.10.0, 3.2.0 > > Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, > HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch, > HDFS-13119.006.patch > > > When a federated cluster has one of the subcluster down, operations that run > in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC > connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13070) Ozone: SCM: Support for container replica reconciliation - 1
[ https://issues.apache.org/jira/browse/HDFS-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDFS-13070: --- Fix Version/s: HDFS-7240 > Ozone: SCM: Support for container replica reconciliation - 1 > > > Key: HDFS-13070 > URL: https://issues.apache.org/jira/browse/HDFS-13070 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-13070-HDFS-7240.000.patch, > HDFS-13070-HDFS-7240.001.patch > > > SCM should process container reports and identify under replicated containers > for re-replication. {{ContainerSupervisor}} should take one NodePool at a > time and start processing the container reports of datanodes in that > NodePool. In this jira we just integrate {{ContainerSupervisor}} into SCM, > actual reconciliation logic will be handled in follow-up jiras. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12977) Add stateId to RPC headers.
[ https://issues.apache.org/jira/browse/HDFS-12977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370481#comment-16370481 ] Plamen Jeliazkov commented on HDFS-12977: - Thanks for taking a look Konstantin. With regards to #1 – If I try to do that right away I would have to import the hdfs module into common. Perhaps I can find a smarter way around that though. May require changing the constructor for Call though. And with #2 – that changes the logic to fetch the EditLog's Txid without the writeLock. As long as this is called after the response is created I think we are OK but I am not sure how this will work out with async EditLog feature. We don't want to end up in a situation where the client gets a response with a Txid that is actually behind what it's requsts's/response's Txid is. > Add stateId to RPC headers. > --- > > Key: HDFS-12977 > URL: https://issues.apache.org/jira/browse/HDFS-12977 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ipc, namenode >Reporter: Konstantin Shvachko >Assignee: Plamen Jeliazkov >Priority: Major > Attachments: HDFS_12977.trunk.001.patch > > > stateId is a new field in the RPC headers of NameNode proto calls. > stateId is the journal transaction Id, which represents LastSeenId for the > clients and LastWrittenId for NameNodes. See more in [reads from Standby > design > doc|https://issues.apache.org/jira/secure/attachment/12902925/ConsistentReadsFromStandbyNode.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13070) Ozone: SCM: Support for container replica reconciliation - 1
[ https://issues.apache.org/jira/browse/HDFS-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370477#comment-16370477 ] Nanda kumar commented on HDFS-13070: Thanks [~anu] for the review. I have committed this to the feature branch. > Ozone: SCM: Support for container replica reconciliation - 1 > > > Key: HDFS-13070 > URL: https://issues.apache.org/jira/browse/HDFS-13070 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > Attachments: HDFS-13070-HDFS-7240.000.patch, > HDFS-13070-HDFS-7240.001.patch > > > SCM should process container reports and identify under replicated containers > for re-replication. {{ContainerSupervisor}} should take one NodePool at a > time and start processing the container reports of datanodes in that > NodePool. In this jira we just integrate {{ContainerSupervisor}} into SCM, > actual reconciliation logic will be handled in follow-up jiras. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13070) Ozone: SCM: Support for container replica reconciliation - 1
[ https://issues.apache.org/jira/browse/HDFS-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDFS-13070: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > Ozone: SCM: Support for container replica reconciliation - 1 > > > Key: HDFS-13070 > URL: https://issues.apache.org/jira/browse/HDFS-13070 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > Attachments: HDFS-13070-HDFS-7240.000.patch, > HDFS-13070-HDFS-7240.001.patch > > > SCM should process container reports and identify under replicated containers > for re-replication. {{ContainerSupervisor}} should take one NodePool at a > time and start processing the container reports of datanodes in that > NodePool. In this jira we just integrate {{ContainerSupervisor}} into SCM, > actual reconciliation logic will be handled in follow-up jiras. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13159) TestTruncateQuotaUpdate fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-13159: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.2.0 Status: Resolved (was: Patch Available) +1 I've committed this. Thanks for the quick fix [~nandakumar131]! > TestTruncateQuotaUpdate fails in trunk > -- > > Key: HDFS-13159 > URL: https://issues.apache.org/jira/browse/HDFS-13159 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Arpit Agarwal >Assignee: Nanda kumar >Priority: Major > Fix For: 3.2.0 > > Attachments: HDFS-13159.000.patch, HDFS-13159.001.patch > > > Details in comment below. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13078) Ozone: Update Ratis on Ozone to 0.1.1-alpha-8fd74ed-SNAPSHOT, to fix large chunk reads (>4M) from Datanodes
[ https://issues.apache.org/jira/browse/HDFS-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370459#comment-16370459 ] Anu Engineer edited comment on HDFS-13078 at 2/20/18 7:06 PM: -- [~msingh] Thanks for the patch. [~szetszwo] Thanks for the review, I have committed this to the feature branch. {{TestKeys}} passed on running locally, just FYI. was (Author: anu): [~msingh] Thanks for the patch. [~szetszwo] Thanks for the review, I have committed this to the feature branch. > Ozone: Update Ratis on Ozone to 0.1.1-alpha-8fd74ed-SNAPSHOT, to fix large > chunk reads (>4M) from Datanodes > --- > > Key: HDFS-13078 > URL: https://issues.apache.org/jira/browse/HDFS-13078 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-13078-HDFS-7240.001.patch, > HDFS-13078-HDFS-7240.002.patch, HDFS-13078-HDFS-7240.003.patch, > HDFS-13078-HDFS-7240.004.patch, HDFS-13078-HDFS-7240.005.patch > > > In Ozone, reads from Ratis read fail because stream is closed before the > reply is received. > {code} > Jan 23, 2018 1:27:14 PM > org.apache.ratis.shaded.io.grpc.netty.NettyServerHandler onStreamError > WARNING: Stream Error > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException: > Stream closed before write could take place > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:499) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:480) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:985) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:941) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:497) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:503) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:587) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:356) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:516) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:260) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:388) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:448) > at > org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) > at > org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) > at > org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) > at >
[jira] [Updated] (HDFS-13078) Ozone: Update Ratis on Ozone to 0.1.1-alpha-8fd74ed-SNAPSHOT, to fix large chunk reads (>4M) from Datanodes
[ https://issues.apache.org/jira/browse/HDFS-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-13078: Resolution: Fixed Status: Resolved (was: Patch Available) [~msingh] Thanks for the patch. [~szetszwo] Thanks for the review, I have committed this to the feature branch. > Ozone: Update Ratis on Ozone to 0.1.1-alpha-8fd74ed-SNAPSHOT, to fix large > chunk reads (>4M) from Datanodes > --- > > Key: HDFS-13078 > URL: https://issues.apache.org/jira/browse/HDFS-13078 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-13078-HDFS-7240.001.patch, > HDFS-13078-HDFS-7240.002.patch, HDFS-13078-HDFS-7240.003.patch, > HDFS-13078-HDFS-7240.004.patch, HDFS-13078-HDFS-7240.005.patch > > > In Ozone, reads from Ratis read fail because stream is closed before the > reply is received. > {code} > Jan 23, 2018 1:27:14 PM > org.apache.ratis.shaded.io.grpc.netty.NettyServerHandler onStreamError > WARNING: Stream Error > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException: > Stream closed before write could take place > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:499) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:480) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:985) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:941) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:497) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:503) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:587) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:356) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:516) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:260) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:388) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:448) > at > org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) > at > org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) > at > org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) > at > org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) > at >
[jira] [Commented] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR
[ https://issues.apache.org/jira/browse/HDFS-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370443#comment-16370443 ] Rakesh R commented on HDFS-13165: - Attached new patch. Following are the changes compares to the previous patch: - built data structure to match the RECEIVED_BLOCK with the expected block moves, then updates track list {{file vs block moves}}. - made DatanodeProtocol.proto changes by removing {{BlocksStorageMoveAttemptFinishedProto}} status result, where we added earlier to notify SPS. - cleaned up BlockStorageMovementTracker - made few minor log changes. Note: the patch is quite big due to proto changes and unit test case refactoring. > [SPS]: Collects successfully moved block details via IBR > > > Key: HDFS-13165 > URL: https://issues.apache.org/jira/browse/HDFS-13165 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-13165-HDFS-10285-00.patch, > HDFS-13165-HDFS-10285-01.patch > > > This task to make use of the existing IBR to get moved block details and > remove unwanted future tracking logic exists in BlockStorageMovementTracker > code, this is no more needed as the file level tracking maintained at NN > itself. > Following comments taken from HDFS-10285, > [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16347472] > Comment-3) > {quote}BPServiceActor > Is it actually sending back the moved blocks? Aren’t IBRs sufficient?{quote} > Comment-21) > {quote} > BlockStorageMovementTracker > Many data structures are riddled with non-threadsafe race conditions and risk > of CMEs. > Ex. The moverTaskFutures map. Adding new blocks and/or adding to a block's > list of futures is synchronized. However the run loop does an unsynchronized > block get, unsynchronized future remove, unsynchronized isEmpty, possibly > another unsynchronized get, only then does it do a synchronized remove of the > block. The whole chunk of code should be synchronized. > Is the problematic moverTaskFutures even needed? It's aggregating futures > per-block for seemingly no reason. Why track all the futures at all instead > of just relying on the completion service? As best I can tell: > It's only used to determine if a future from the completion service should be > ignored during shutdown. Shutdown sets the running boolean to false and > clears the entire datastructure so why not use the running boolean like a > check just a little further down? > As synchronization to sleep up to 2 seconds before performing a blocking > moverCompletionService.take, but only when it thinks there are no active > futures. I'll ignore the missed notify race that the bounded wait masks, but > the real question is why not just do the blocking take? > Why all the complexity? Am I missing something? > BlocksMovementsStatusHandler > Suffers same type of thread safety issues as StoragePolicySatisfyWorker. Ex. > blockIdVsMovementStatus is inconsistent synchronized. Does synchronize to > return an unmodifiable list which sadly does nothing to protect the caller > from CME. > handle is iterating over a non-thread safe list. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13078) Ozone: Update Ratis on Ozone to 0.1.1-alpha-8fd74ed-SNAPSHOT, to fix large chunk reads (>4M) from Datanodes
[ https://issues.apache.org/jira/browse/HDFS-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370424#comment-16370424 ] Anu Engineer commented on HDFS-13078: - [~szetszwo] Thanks for the review. I will commit this patch shortly. > Ozone: Update Ratis on Ozone to 0.1.1-alpha-8fd74ed-SNAPSHOT, to fix large > chunk reads (>4M) from Datanodes > --- > > Key: HDFS-13078 > URL: https://issues.apache.org/jira/browse/HDFS-13078 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-13078-HDFS-7240.001.patch, > HDFS-13078-HDFS-7240.002.patch, HDFS-13078-HDFS-7240.003.patch, > HDFS-13078-HDFS-7240.004.patch, HDFS-13078-HDFS-7240.005.patch > > > In Ozone, reads from Ratis read fail because stream is closed before the > reply is received. > {code} > Jan 23, 2018 1:27:14 PM > org.apache.ratis.shaded.io.grpc.netty.NettyServerHandler onStreamError > WARNING: Stream Error > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException: > Stream closed before write could take place > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:499) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:480) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:985) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:941) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:497) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:503) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:587) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:356) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:516) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:260) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:388) > at > org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:448) > at > org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) > at > org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) > at > org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) > at > org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) > at > org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) > at >
[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters
[ https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370421#comment-16370421 ] Íñigo Goiri commented on HDFS-13119: [~linyiqun] I've been running this locally and the test takes a long time to run. Right now, it's doing twice a retry of 10 times with a timeout of 1 second and a sleep of 1. Checking the [test results|https://builds.apache.org/job/PreCommit-HDFS-Build/23121/testReport/org.apache.hadoop.hdfs.server.federation.router/TestRouterRPCClientRetries/], this makes 40 seconds for {{testRetryWhenOneNameServiceDown}} and 16 for {{testRetryWhenAllNameServiceDown}}. I think this is unnecessary and we could tune: * IPC_CLIENT_CONNECT_MAX_RETRIES_KEY * IPC_CLIENT_CONNECT_RETRY_INTERVAL_KEY In addition, to reduce the creation of {{MiniDFSCluster}}, we could use {{BeforeClass}}. If you are on-board, I would open a new JIRA for this. > RBF: Manage unavailable clusters > > > Key: HDFS-13119 > URL: https://issues.apache.org/jira/browse/HDFS-13119 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Íñigo Goiri >Assignee: Yiqun Lin >Priority: Major > Labels: RBF > Fix For: 3.1.0, 2.10.0, 3.2.0 > > Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, > HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch, > HDFS-13119.006.patch > > > When a federated cluster has one of the subcluster down, operations that run > in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC > connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Bota updated HDFS-11187: -- Status: Patch Available (was: Reopened) Patch submitted for branch-2.7. Cherry-picking the commit from branch-2. Conflicts: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FinalizedReplica.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java > Optimize disk access for last partial chunk checksum of Finalized replica > - > > Key: HDFS-11187 > URL: https://issues.apache.org/jira/browse/HDFS-11187 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2 > > Attachments: HDFS-11187-branch-2.001.patch, > HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, > HDFS-11187-branch-2.004.patch, HDFS-11187-branch-2.7.001.patch, > HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, > HDFS-11187.004.patch, HDFS-11187.005.patch > > > The patch at HDFS-11160 ensures BlockSender reads the correct version of > metafile when there are concurrent writers. > However, the implementation is not optimal, because it must always read the > last partial chunk checksum from disk while holding FsDatasetImpl lock for > every reader. It is possible to optimize this by keeping an up-to-date > version of last partial checksum in-memory and reduce disk access. > I am separating the optimization into a new jira, because maintaining the > state of in-memory checksum requires a lot more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Bota reopened HDFS-11187: --- Reopened to provide patch for branch-2.7 > Optimize disk access for last partial chunk checksum of Finalized replica > - > > Key: HDFS-11187 > URL: https://issues.apache.org/jira/browse/HDFS-11187 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2 > > Attachments: HDFS-11187-branch-2.001.patch, > HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, > HDFS-11187-branch-2.004.patch, HDFS-11187-branch-2.7.001.patch, > HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, > HDFS-11187.004.patch, HDFS-11187.005.patch > > > The patch at HDFS-11160 ensures BlockSender reads the correct version of > metafile when there are concurrent writers. > However, the implementation is not optimal, because it must always read the > last partial chunk checksum from disk while holding FsDatasetImpl lock for > every reader. It is possible to optimize this by keeping an up-to-date > version of last partial checksum in-memory and reduce disk access. > I am separating the optimization into a new jira, because maintaining the > state of in-memory checksum requires a lot more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Bota updated HDFS-11187: -- Attachment: HDFS-11187-branch-2.7.001.patch > Optimize disk access for last partial chunk checksum of Finalized replica > - > > Key: HDFS-11187 > URL: https://issues.apache.org/jira/browse/HDFS-11187 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2 > > Attachments: HDFS-11187-branch-2.001.patch, > HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, > HDFS-11187-branch-2.004.patch, HDFS-11187-branch-2.7.001.patch, > HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, > HDFS-11187.004.patch, HDFS-11187.005.patch > > > The patch at HDFS-11160 ensures BlockSender reads the correct version of > metafile when there are concurrent writers. > However, the implementation is not optimal, because it must always read the > last partial chunk checksum from disk while holding FsDatasetImpl lock for > every reader. It is possible to optimize this by keeping an up-to-date > version of last partial checksum in-memory and reduce disk access. > I am separating the optimization into a new jira, because maintaining the > state of in-memory checksum requires a lot more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR
[ https://issues.apache.org/jira/browse/HDFS-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-13165: Attachment: HDFS-13165-HDFS-10285-01.patch > [SPS]: Collects successfully moved block details via IBR > > > Key: HDFS-13165 > URL: https://issues.apache.org/jira/browse/HDFS-13165 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-13165-HDFS-10285-00.patch, > HDFS-13165-HDFS-10285-01.patch > > > This task to make use of the existing IBR to get moved block details and > remove unwanted future tracking logic exists in BlockStorageMovementTracker > code, this is no more needed as the file level tracking maintained at NN > itself. > Following comments taken from HDFS-10285, > [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16347472] > Comment-3) > {quote}BPServiceActor > Is it actually sending back the moved blocks? Aren’t IBRs sufficient?{quote} > Comment-21) > {quote} > BlockStorageMovementTracker > Many data structures are riddled with non-threadsafe race conditions and risk > of CMEs. > Ex. The moverTaskFutures map. Adding new blocks and/or adding to a block's > list of futures is synchronized. However the run loop does an unsynchronized > block get, unsynchronized future remove, unsynchronized isEmpty, possibly > another unsynchronized get, only then does it do a synchronized remove of the > block. The whole chunk of code should be synchronized. > Is the problematic moverTaskFutures even needed? It's aggregating futures > per-block for seemingly no reason. Why track all the futures at all instead > of just relying on the completion service? As best I can tell: > It's only used to determine if a future from the completion service should be > ignored during shutdown. Shutdown sets the running boolean to false and > clears the entire datastructure so why not use the running boolean like a > check just a little further down? > As synchronization to sleep up to 2 seconds before performing a blocking > moverCompletionService.take, but only when it thinks there are no active > futures. I'll ignore the missed notify race that the bounded wait masks, but > the real question is why not just do the blocking take? > Why all the complexity? Am I missing something? > BlocksMovementsStatusHandler > Suffers same type of thread safety issues as StoragePolicySatisfyWorker. Ex. > blockIdVsMovementStatus is inconsistent synchronized. Does synchronize to > return an unmodifiable list which sadly does nothing to protect the caller > from CME. > handle is iterating over a non-thread safe list. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13070) Ozone: SCM: Support for container replica reconciliation - 1
[ https://issues.apache.org/jira/browse/HDFS-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370415#comment-16370415 ] Anu Engineer commented on HDFS-13070: - +1, the patch looks good to me. Thanks for taking care of this. > Ozone: SCM: Support for container replica reconciliation - 1 > > > Key: HDFS-13070 > URL: https://issues.apache.org/jira/browse/HDFS-13070 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > Attachments: HDFS-13070-HDFS-7240.000.patch, > HDFS-13070-HDFS-7240.001.patch > > > SCM should process container reports and identify under replicated containers > for re-replication. {{ContainerSupervisor}} should take one NodePool at a > time and start processing the container reports of datanodes in that > NodePool. In this jira we just integrate {{ContainerSupervisor}} into SCM, > actual reconciliation logic will be handled in follow-up jiras. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13170) Port webhdfs unmaskedpermission parameter to HTTPFS
[ https://issues.apache.org/jira/browse/HDFS-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370406#comment-16370406 ] genericqa commented on HDFS-13170: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 57s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 17s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-httpfs: The patch generated 9 new + 397 unchanged - 7 fixed = 406 total (was 404) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 29s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 40s{color} | {color:green} hadoop-hdfs-httpfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 54m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13170 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12911250/HDFS-13170.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux b2e9ac2d1724 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8896d20 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/23130/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-httpfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/23130/artifact/out/whitespace-eol.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23130/testReport/ | | Max. process+thread count | 686 (vs. ulimit of 5500) | | modules | C:
[jira] [Commented] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
[ https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370405#comment-16370405 ] Elek, Marton commented on HDFS-13108: - Final patch has been uploaded. Order of imports and all the assertion messages are fixed (Thanks to [~ste...@apache.org]'s comments). Both the contract and normal unit tests are passing. > Ozone: OzoneFileSystem: Simplified url schema for Ozone File System > --- > > Key: HDFS-13108 > URL: https://issues.apache.org/jira/browse/HDFS-13108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Attachments: HDFS-13108-HDFS-7240.001.patch, > HDFS-13108-HDFS-7240.002.patch, HDFS-13108-HDFS-7240.003.patch, > HDFS-13108-HDFS-7240.005.patch, HDFS-13108-HDFS-7240.006.patch > > > A. Current state > > 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. > o3://datanode:9864/test/bucket1) > 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the > keys from the bucket1) > It works very well, but there are some limitations. > B. Problem one > The current code doesn't support fully qualified locations. For example 'dfs > -ls o3://datanode:9864/test/bucket1/dir1' is not working. > C.) Problem two > I tried to fix the previous problem, but it's not trivial. The biggest > problem is that there is a Path.makeQualified call which could transform > unqualified url to qualified url. This is part of the Path.java so it's > common for all the Hadoop file systems. > In the current implementations it qualifies an url with keeping the schema > (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use > the relative path as the end of the qualified url. For example: > makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will > return o3://datanode:9864/dir1/file which is obviously wrong (the good would > be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround > with using a custom makeQualified in the Ozone code and it worked from > command line but couldn't work with Spark which use the Hadoop api and the > original makeQualified path. > D.) Solution > We should support makeQualified calls, so we can use any path in the > defaultFS. > > I propose to use a simplified schema as o3://bucket.volume/ > This is similar to the s3a format where the pattern is s3a://bucket.region/ > We don't need to set the hostname of the datanode (or ksm in case of service > discovery) but it would be configurable with additional hadoop configuraion > values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 > (this is how the s3a works today, as I know). > We also need to define restrictions for the volume names (in our case it > should not include dot any more). > ps: some spark output > 2018-02-03 18:43:04 WARN Client:66 - Neither spark.yarn.jars nor > spark.yarn.archive is set, falling back to uploading libraries under > SPARK_HOME. > 2018-02-03 18:43:05 INFO Client:54 - Uploading resource > file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip > -> > o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip > My default fs was o3://datanode:9864/test/bucket1, but spark qualified the > name of the home directory. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
[ https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton updated HDFS-13108: Attachment: HDFS-13108-HDFS-7240.006.patch > Ozone: OzoneFileSystem: Simplified url schema for Ozone File System > --- > > Key: HDFS-13108 > URL: https://issues.apache.org/jira/browse/HDFS-13108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Attachments: HDFS-13108-HDFS-7240.001.patch, > HDFS-13108-HDFS-7240.002.patch, HDFS-13108-HDFS-7240.003.patch, > HDFS-13108-HDFS-7240.005.patch, HDFS-13108-HDFS-7240.006.patch > > > A. Current state > > 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. > o3://datanode:9864/test/bucket1) > 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the > keys from the bucket1) > It works very well, but there are some limitations. > B. Problem one > The current code doesn't support fully qualified locations. For example 'dfs > -ls o3://datanode:9864/test/bucket1/dir1' is not working. > C.) Problem two > I tried to fix the previous problem, but it's not trivial. The biggest > problem is that there is a Path.makeQualified call which could transform > unqualified url to qualified url. This is part of the Path.java so it's > common for all the Hadoop file systems. > In the current implementations it qualifies an url with keeping the schema > (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use > the relative path as the end of the qualified url. For example: > makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will > return o3://datanode:9864/dir1/file which is obviously wrong (the good would > be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround > with using a custom makeQualified in the Ozone code and it worked from > command line but couldn't work with Spark which use the Hadoop api and the > original makeQualified path. > D.) Solution > We should support makeQualified calls, so we can use any path in the > defaultFS. > > I propose to use a simplified schema as o3://bucket.volume/ > This is similar to the s3a format where the pattern is s3a://bucket.region/ > We don't need to set the hostname of the datanode (or ksm in case of service > discovery) but it would be configurable with additional hadoop configuraion > values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 > (this is how the s3a works today, as I know). > We also need to define restrictions for the volume names (in our case it > should not include dot any more). > ps: some spark output > 2018-02-03 18:43:04 WARN Client:66 - Neither spark.yarn.jars nor > spark.yarn.archive is set, falling back to uploading libraries under > SPARK_HOME. > 2018-02-03 18:43:05 INFO Client:54 - Uploading resource > file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip > -> > o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip > My default fs was o3://datanode:9864/test/bucket1, but spark qualified the > name of the home directory. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs
[ https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-13102: --- Comment: was deleted (was: Thanks Nicholas for the Review. There are some issues for which I feel we should maintain a list maintaining the skip indices . I think its better to have a call sometime tomorrow. If we keep the skipIndices maintained in a list, the power logic will also work.. I do agree that the addFirst method won't work in the current scenario and I would like to discuss with you on this part as how to handle this as this will be called when the nameNode starts up..So it may require a different handling. Let me know in case you are available tomorrow any time. Thanks Shashi On 2/20/18, 11:36 PM, "Tsz Wo Nicholas Sze (JIRA)"wrote: [ https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370378#comment-16370378 ] Tsz Wo Nicholas Sze commented on HDFS-13102: Some more comments: - There seems a bug in addFirst -- it should add at index 0, i.e. skipNodeList.add(0, node). Then, checkAndPromoteIfNeeded() won't work for it. - With remove, we cannot use power to determine the skip indices. I understand that remove() is not implemented here. Are you going to change the computation in combineDiffs() when adding remove()? {code} //combineDiffs() // At each level no of entries to be combined to promote to a // higher level will be equal to skip interval, eg: assuming skip interval // of 4, at level 0, s0, s1 ,s2 and s3 will be combined to form s0-3. // similarly, s4-7, s8-11 and s11-15 will be constructed at level 1. // At level 1, s0-3, s4-7, s8-11, s11-15 will be combined to construct // s0-15 and so on. Double power = Math.pow(skipInterval, levelIterator); {code} > Implement SnapshotSkipList class to store Multi level DirectoryDiffs > > > Key: HDFS-13102 > URL: https://issues.apache.org/jira/browse/HDFS-13102 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, HDFS-13102.003.patch > > > HDFS-11225 explains an issue where deletion of older snapshots can take a very long time in case the no of snapshot diffs is quite large for directories. For any directory under a snapshot, to construct the children list , it needs to combine all the diffs from that particular snapshot to the last snapshotDiff record and reverseApply to the current children list of the directory on live fs. This can take a significant time if the no of snapshot diffs are quite large and changes per diff is significant. > This Jira proposes to store the Directory diffs in a SnapshotSkip list, where we store multi level DirectoryDiffs. At each level, the Directory Diff will be cumulative diff of k snapshot diffs, > where k is the level of a node in the list. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) ) > Implement SnapshotSkipList class to store Multi level DirectoryDiffs > > > Key: HDFS-13102 > URL: https://issues.apache.org/jira/browse/HDFS-13102 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, > HDFS-13102.003.patch > > > HDFS-11225 explains an issue where deletion of older snapshots can take a > very long time in case the no of snapshot diffs is quite large for > directories. For any directory under a snapshot, to construct the children > list , it needs to combine all the diffs from that particular snapshot to the > last snapshotDiff record and reverseApply to the current children list of the > directory on live fs. This can take a significant time if the no of snapshot > diffs are quite large and changes per diff is significant. > This Jira proposes to store the Directory diffs in a SnapshotSkip list, where > we store multi level DirectoryDiffs. At each level, the Directory Diff will > be cumulative diff of k snapshot diffs, > where k is the level of a node in the list. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs
[ https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-13102: --- Comment: was deleted (was: I am holding on to other patches until this gets finalized..Because changing this patch will invariantly change other patches as well. On 2/20/18, 11:36 PM, "Tsz Wo Nicholas Sze (JIRA)"wrote: [ https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370378#comment-16370378 ] Tsz Wo Nicholas Sze commented on HDFS-13102: Some more comments: - There seems a bug in addFirst -- it should add at index 0, i.e. skipNodeList.add(0, node). Then, checkAndPromoteIfNeeded() won't work for it. - With remove, we cannot use power to determine the skip indices. I understand that remove() is not implemented here. Are you going to change the computation in combineDiffs() when adding remove()? {code} //combineDiffs() // At each level no of entries to be combined to promote to a // higher level will be equal to skip interval, eg: assuming skip interval // of 4, at level 0, s0, s1 ,s2 and s3 will be combined to form s0-3. // similarly, s4-7, s8-11 and s11-15 will be constructed at level 1. // At level 1, s0-3, s4-7, s8-11, s11-15 will be combined to construct // s0-15 and so on. Double power = Math.pow(skipInterval, levelIterator); {code} > Implement SnapshotSkipList class to store Multi level DirectoryDiffs > > > Key: HDFS-13102 > URL: https://issues.apache.org/jira/browse/HDFS-13102 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, HDFS-13102.003.patch > > > HDFS-11225 explains an issue where deletion of older snapshots can take a very long time in case the no of snapshot diffs is quite large for directories. For any directory under a snapshot, to construct the children list , it needs to combine all the diffs from that particular snapshot to the last snapshotDiff record and reverseApply to the current children list of the directory on live fs. This can take a significant time if the no of snapshot diffs are quite large and changes per diff is significant. > This Jira proposes to store the Directory diffs in a SnapshotSkip list, where we store multi level DirectoryDiffs. At each level, the Directory Diff will be cumulative diff of k snapshot diffs, > where k is the level of a node in the list. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) ) > Implement SnapshotSkipList class to store Multi level DirectoryDiffs > > > Key: HDFS-13102 > URL: https://issues.apache.org/jira/browse/HDFS-13102 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, > HDFS-13102.003.patch > > > HDFS-11225 explains an issue where deletion of older snapshots can take a > very long time in case the no of snapshot diffs is quite large for > directories. For any directory under a snapshot, to construct the children > list , it needs to combine all the diffs from that particular snapshot to the > last snapshotDiff record and reverseApply to the current children list of the > directory on live fs. This can take a significant time if the no of snapshot > diffs are quite large and changes per diff is significant. > This Jira proposes to store the Directory diffs in a SnapshotSkip list, where > we store multi level DirectoryDiffs. At each level, the Directory Diff will > be cumulative diff of k snapshot diffs, > where k is the level of a node in the list. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs
[ https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370396#comment-16370396 ] Shashikant Banerjee commented on HDFS-13102: I am holding on to other patches until this gets finalized..Because changing this patch will invariantly change other patches as well. On 2/20/18, 11:36 PM, "Tsz Wo Nicholas Sze (JIRA)"wrote: [ https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370378#comment-16370378 ] Tsz Wo Nicholas Sze commented on HDFS-13102: Some more comments: - There seems a bug in addFirst -- it should add at index 0, i.e. skipNodeList.add(0, node). Then, checkAndPromoteIfNeeded() won't work for it. - With remove, we cannot use power to determine the skip indices. I understand that remove() is not implemented here. Are you going to change the computation in combineDiffs() when adding remove()? {code} //combineDiffs() // At each level no of entries to be combined to promote to a // higher level will be equal to skip interval, eg: assuming skip interval // of 4, at level 0, s0, s1 ,s2 and s3 will be combined to form s0-3. // similarly, s4-7, s8-11 and s11-15 will be constructed at level 1. // At level 1, s0-3, s4-7, s8-11, s11-15 will be combined to construct // s0-15 and so on. Double power = Math.pow(skipInterval, levelIterator); {code} > Implement SnapshotSkipList class to store Multi level DirectoryDiffs > > > Key: HDFS-13102 > URL: https://issues.apache.org/jira/browse/HDFS-13102 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, HDFS-13102.003.patch > > > HDFS-11225 explains an issue where deletion of older snapshots can take a very long time in case the no of snapshot diffs is quite large for directories. For any directory under a snapshot, to construct the children list , it needs to combine all the diffs from that particular snapshot to the last snapshotDiff record and reverseApply to the current children list of the directory on live fs. This can take a significant time if the no of snapshot diffs are quite large and changes per diff is significant. > This Jira proposes to store the Directory diffs in a SnapshotSkip list, where we store multi level DirectoryDiffs. At each level, the Directory Diff will be cumulative diff of k snapshot diffs, > where k is the level of a node in the list. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) > Implement SnapshotSkipList class to store Multi level DirectoryDiffs > > > Key: HDFS-13102 > URL: https://issues.apache.org/jira/browse/HDFS-13102 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, > HDFS-13102.003.patch > > > HDFS-11225 explains an issue where deletion of older snapshots can take a > very long time in case the no of snapshot diffs is quite large for > directories. For any directory under a snapshot, to construct the children > list , it needs to combine all the diffs from that particular snapshot to the > last snapshotDiff record and reverseApply to the current children list of the > directory on live fs. This can take a significant time if the no of snapshot > diffs are quite large and changes per diff is significant. > This Jira proposes to store the Directory diffs in a SnapshotSkip list, where > we store multi level DirectoryDiffs. At each level, the Directory Diff will > be cumulative diff of k snapshot diffs, > where k is the level of a node in the list. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs
[ https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370391#comment-16370391 ] Shashikant Banerjee commented on HDFS-13102: Thanks Nicholas for the Review. There are some issues for which I feel we should maintain a list maintaining the skip indices . I think its better to have a call sometime tomorrow. If we keep the skipIndices maintained in a list, the power logic will also work.. I do agree that the addFirst method won't work in the current scenario and I would like to discuss with you on this part as how to handle this as this will be called when the nameNode starts up..So it may require a different handling. Let me know in case you are available tomorrow any time. Thanks Shashi On 2/20/18, 11:36 PM, "Tsz Wo Nicholas Sze (JIRA)"wrote: [ https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370378#comment-16370378 ] Tsz Wo Nicholas Sze commented on HDFS-13102: Some more comments: - There seems a bug in addFirst -- it should add at index 0, i.e. skipNodeList.add(0, node). Then, checkAndPromoteIfNeeded() won't work for it. - With remove, we cannot use power to determine the skip indices. I understand that remove() is not implemented here. Are you going to change the computation in combineDiffs() when adding remove()? {code} //combineDiffs() // At each level no of entries to be combined to promote to a // higher level will be equal to skip interval, eg: assuming skip interval // of 4, at level 0, s0, s1 ,s2 and s3 will be combined to form s0-3. // similarly, s4-7, s8-11 and s11-15 will be constructed at level 1. // At level 1, s0-3, s4-7, s8-11, s11-15 will be combined to construct // s0-15 and so on. Double power = Math.pow(skipInterval, levelIterator); {code} > Implement SnapshotSkipList class to store Multi level DirectoryDiffs > > > Key: HDFS-13102 > URL: https://issues.apache.org/jira/browse/HDFS-13102 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, HDFS-13102.003.patch > > > HDFS-11225 explains an issue where deletion of older snapshots can take a very long time in case the no of snapshot diffs is quite large for directories. For any directory under a snapshot, to construct the children list , it needs to combine all the diffs from that particular snapshot to the last snapshotDiff record and reverseApply to the current children list of the directory on live fs. This can take a significant time if the no of snapshot diffs are quite large and changes per diff is significant. > This Jira proposes to store the Directory diffs in a SnapshotSkip list, where we store multi level DirectoryDiffs. At each level, the Directory Diff will be cumulative diff of k snapshot diffs, > where k is the level of a node in the list. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) > Implement SnapshotSkipList class to store Multi level DirectoryDiffs > > > Key: HDFS-13102 > URL: https://issues.apache.org/jira/browse/HDFS-13102 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, > HDFS-13102.003.patch > > > HDFS-11225 explains an issue where deletion of older snapshots can take a > very long time in case the no of snapshot diffs is quite large for > directories. For any directory under a snapshot, to construct the children > list , it needs to combine all the diffs from that particular snapshot to the > last snapshotDiff record and reverseApply to the current children list of the > directory on live fs. This can take a significant time if the no of snapshot > diffs are quite large and changes per diff is significant. > This Jira proposes to store the Directory diffs in a SnapshotSkip list, where > we store multi level DirectoryDiffs. At each level, the Directory Diff will > be cumulative diff of k snapshot diffs, > where k is the level of a node in the list. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs
[ https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370378#comment-16370378 ] Tsz Wo Nicholas Sze commented on HDFS-13102: Some more comments: - There seems a bug in addFirst -- it should add at index 0, i.e. skipNodeList.add(0, node). Then, checkAndPromoteIfNeeded() won't work for it. - With remove, we cannot use power to determine the skip indices. I understand that remove() is not implemented here. Are you going to change the computation in combineDiffs() when adding remove()? {code} //combineDiffs() // At each level no of entries to be combined to promote to a // higher level will be equal to skip interval, eg: assuming skip interval // of 4, at level 0, s0, s1 ,s2 and s3 will be combined to form s0-3. // similarly, s4-7, s8-11 and s11-15 will be constructed at level 1. // At level 1, s0-3, s4-7, s8-11, s11-15 will be combined to construct // s0-15 and so on. Double power = Math.pow(skipInterval, levelIterator); {code} > Implement SnapshotSkipList class to store Multi level DirectoryDiffs > > > Key: HDFS-13102 > URL: https://issues.apache.org/jira/browse/HDFS-13102 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, > HDFS-13102.003.patch > > > HDFS-11225 explains an issue where deletion of older snapshots can take a > very long time in case the no of snapshot diffs is quite large for > directories. For any directory under a snapshot, to construct the children > list , it needs to combine all the diffs from that particular snapshot to the > last snapshotDiff record and reverseApply to the current children list of the > directory on live fs. This can take a significant time if the no of snapshot > diffs are quite large and changes per diff is significant. > This Jira proposes to store the Directory diffs in a SnapshotSkip list, where > we store multi level DirectoryDiffs. At each level, the Directory Diff will > be cumulative diff of k snapshot diffs, > where k is the level of a node in the list. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList
[ https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370367#comment-16370367 ] Íñigo Goiri edited comment on HDFS-13168 at 2/20/18 6:02 PM: - {{StringBuilder}} is more optimal but I'm not sure how does + in {{String}} is implemented. Anyway, just philosophical at this point, it should be good either way. I'm committing [^HDFS-13168.2.patch] to {{trunk}}; [~belugabehr], for the commit message do I use your name capitalized as it is here or do you prefer some other spelling? was (Author: elgoiri): {{StringBuilder}} is more optimal but I'm not sure how does + in {{String}} is implemented. Anyway, just philosophical at this point, it should be good either way. I'm committing [^HDFS-13168.2.patch] to {{trunk}}; [~belugabehr], for the commit message do I use for your name capitalized as it is here or do you prefer some other spelling? > XmlImageVisitor - Prefer Array over LinkedList > -- > > Key: HDFS-13168 > URL: https://issues.apache.org/jira/browse/HDFS-13168 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch > > > {{ArrayDeque}} > {quote}This class is likely to be faster than Stack when used as a stack, and > faster than LinkedList when used as a queue.{quote} > .. not to mention less memory fragmentation (single backing array v.s. many > ArrayList nodes). > https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList
[ https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370367#comment-16370367 ] Íñigo Goiri commented on HDFS-13168: {{StringBuilder}} is more optimal but I'm not sure how does + in {{String}} is implemented. Anyway, just philosophical at this point, it should be good either way. I'm committing [^HDFS-13168.2.patch] to {{trunk}}; [~belugabehr], for the commit message do I use for your name capitalized as it is here or do you prefer some other spelling? > XmlImageVisitor - Prefer Array over LinkedList > -- > > Key: HDFS-13168 > URL: https://issues.apache.org/jira/browse/HDFS-13168 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch > > > {{ArrayDeque}} > {quote}This class is likely to be faster than Stack when used as a stack, and > faster than LinkedList when used as a queue.{quote} > .. not to mention less memory fragmentation (single backing array v.s. many > ArrayList nodes). > https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370364#comment-16370364 ] Gabor Bota commented on HDFS-11187: --- Hi [~xkrogen], I'll check if it can be applied easily soon. > Optimize disk access for last partial chunk checksum of Finalized replica > - > > Key: HDFS-11187 > URL: https://issues.apache.org/jira/browse/HDFS-11187 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2 > > Attachments: HDFS-11187-branch-2.001.patch, > HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, > HDFS-11187-branch-2.004.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, > HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch > > > The patch at HDFS-11160 ensures BlockSender reads the correct version of > metafile when there are concurrent writers. > However, the implementation is not optimal, because it must always read the > last partial chunk checksum from disk while holding FsDatasetImpl lock for > every reader. It is possible to optimize this by keeping an up-to-date > version of last partial checksum in-memory and reduce disk access. > I am separating the optimization into a new jira, because maintaining the > state of in-memory checksum requires a lot more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs
[ https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370365#comment-16370365 ] Tsz Wo Nicholas Sze commented on HDFS-13102: > Removes will be handled as a part HDFS-13171 If remove() is not working yet, please throw UnsupportedOperationException in this JIRA. > ... Now locating the previous multiLevel node and next multiLevelNode might > be a little cumbersome as we need to iterate through the list again and check > which node is actually a multiLevel node. ... Just iterate starting at the deleted element but not the entire list. Maintaining diffSetIndexList needs extra memory. > ... We need to have the INodeDirectory passed to DirectoryDiffList, as with > INodeDirectory reference itself, we will be able to read the configured value > of SkipInterval. Passing it in the constructor is fine but not storing it. It occurpies memory for storing the INodeDirectory. > ... getMinListForRange actually gives a list of childrenDiff(not > DirectoryDiffs ... Good point! Then, both DiffListByArrayList and unmodifiableList should implement getSumForRange. > Implement SnapshotSkipList class to store Multi level DirectoryDiffs > > > Key: HDFS-13102 > URL: https://issues.apache.org/jira/browse/HDFS-13102 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, > HDFS-13102.003.patch > > > HDFS-11225 explains an issue where deletion of older snapshots can take a > very long time in case the no of snapshot diffs is quite large for > directories. For any directory under a snapshot, to construct the children > list , it needs to combine all the diffs from that particular snapshot to the > last snapshotDiff record and reverseApply to the current children list of the > directory on live fs. This can take a significant time if the no of snapshot > diffs are quite large and changes per diff is significant. > This Jira proposes to store the Directory diffs in a SnapshotSkip list, where > we store multi level DirectoryDiffs. At each level, the Directory Diff will > be cumulative diff of k snapshot diffs, > where k is the level of a node in the list. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters
[ https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370358#comment-16370358 ] Íñigo Goiri commented on HDFS-13119: I added [^HDFS-13119.006.patch] with the committed version to trunk for completeness. As this is technically a bug, I'd like to push it for 2.9.1 and 3.0.1 (or 3.0.2). Any idea what's the current state with the branches? My guess is {{branch-2.9}} and {{branch-3.0}}. [~chris.douglas] which ones would be the right ones for 2.9.1 and 3.0.X? > RBF: Manage unavailable clusters > > > Key: HDFS-13119 > URL: https://issues.apache.org/jira/browse/HDFS-13119 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Íñigo Goiri >Assignee: Yiqun Lin >Priority: Major > Labels: RBF > Fix For: 3.1.0, 2.10.0, 3.2.0 > > Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, > HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch, > HDFS-13119.006.patch > > > When a federated cluster has one of the subcluster down, operations that run > in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC > connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13119) RBF: Manage unavailable clusters
[ https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13119: --- Attachment: HDFS-13119.006.patch > RBF: Manage unavailable clusters > > > Key: HDFS-13119 > URL: https://issues.apache.org/jira/browse/HDFS-13119 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Íñigo Goiri >Assignee: Yiqun Lin >Priority: Major > Labels: RBF > Fix For: 3.1.0, 2.10.0, 3.2.0 > > Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, > HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch, > HDFS-13119.006.patch > > > When a federated cluster has one of the subcluster down, operations that run > in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC > connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs
[ https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370330#comment-16370330 ] Shashikant Banerjee edited comment on HDFS-13102 at 2/20/18 5:37 PM: - Thanks [~szetszwo], for the review comments. {code:java} TestDirectoryDiffList does not test remove(..). As mentioned, remove(..) seems having some bugs.{code} I agree that TestDirectoryDiffList does not test removes. Removes will be handled as a part HDFS-13171.Removes need to balance the list and hence, i would like to update it in a different Jira. {code:java} diffSetIndexList does not seem useful since it is the same as the nodes in level 1. BTW, diffSetIndexList is not updated when remove an element so that it seems a bug. I suggest removing diffSetIndexList since it can be computed if necessary.{code} diffSetIndexList is a list which maintains the indices for all the multiLevelNodes. I think diffSetIndexList should be kept as otherwise, for determining multiLevel nodes when sequence of delete happen will be cumbersome. For example, let's say we have snapshotDiff stored for a directory as follows with a skip interval of 3: s0-->s1->s2>s3>s4>s5->s6- In this case, s0 and s3 will be multiLevel nodes. Let's say s2 gets deleted followed by s3. Now locating the previous multiLevel node and next multiLevelNode might be a little cumbersome as we need to iterate through the list again and check which node is actually a multiLevel node. DiffsetIndexList will simplify the balancing the skipList in case of deletions. It will be updated accordingly when we handle deletes with HDFS-13171. {code:java} Pass INodeDirectory as a parameter in getSumForRange(..). Then, we could remove INodeDirectory dir from DirectoryDiffList{code} We need to have the INodeDirectory passed to DirectoryDiffList, as with INodeDirectory reference itself, we will be able to read the configured value of SkipInterval. This will be a part of HDFS-13173. {code:java} Let's replace getSumForRange with getMinListForRange in DiffList so that we may implement it DiffListByArrayList using subList.{code} getMinListForRange actually gives a list of childrenDiff(not DirectoryDiffs which is the basic element stored in the List). Putting this API in the diffList interface method might not make much sense. In case, this API seems suitable for DiffList interface method, we need to change the API to return list of DirectoryDiffs rather than childrenDiff here. For DiffListByArrayList (which will be used to store FileDiffs in general), does not have childrenDiff element. was (Author: shashikant): Thanks [~szetszwo], for the review comments. {code:java} TestDirectoryDiffList does not test remove(..). As mentioned, remove(..) seems having some bugs.{code} I agree that TestDirectoryDiffList does not test removes. Removes will be handled as a part HDFS-13171.Removes need to balance the list and hence, i would like to update it in a different Jira. {code:java} diffSetIndexList does not seem useful since it is the same as the nodes in level 1. BTW, diffSetIndexList is not updated when remove an element so that it seems a bug. I suggest removing diffSetIndexList since it can be computed if necessary.{code} diffSetIndexList is a list which maintains the indices for all the multiLevelNodes. I think diffSetIndexList should be kept as otherwise, for determining multiLevel nodes when sequence of delete happen will be cumbersome. For example, let's say we have snapshotDiff stored for a directory as follows with a skip interval of 3: s0-->s1->s2>s3>s4>s5->s6- In this case, s0 and s3 will be multiLevel nodes. Let's say s2 gets deleted followed by s3. Now locating the previous multiLevel node and next multiLevelNode might be a little cumbersome as we need to iterate through the list again and check which node is actually a multiLevel node. DiffsetIndexList will simplify the balancing the skipList in case of deletions. It will be updated accordingly when we handle deletes with HDFS-13171. {code:java} Pass INodeDirectory as a parameter in getSumForRange(..). Then, we could remove INodeDirectory dir from DirectoryDiffList{code} We need to have the INodeDirectory passed to DirectoryDiffList, as with INodeDirectory reference itself, we will be able to read the configured value of SkipInterval. This will be a part of HDFS-13173. {code:java} Let's replace getSumForRange with getMinListForRange in DiffList so that we may implement it DiffListByArrayList using subList.{code} getMinListForRange actually gives a list of childrenDiff(not DirectoryDiffs which is the basic element stored in the List). Putting this API in the diffList interface method might not make much sense. In case, this API seems suitable for DiffList interface method, we need to change the API to return list of DirectoryDiffs rather than childrenDiff here. For
[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs
[ https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370330#comment-16370330 ] Shashikant Banerjee commented on HDFS-13102: Thanks [~szetszwo], for the review comments. {code:java} TestDirectoryDiffList does not test remove(..). As mentioned, remove(..) seems having some bugs.{code} I agree that TestDirectoryDiffList does not test removes. Removes will be handled as a part HDFS-13171.Removes need to balance the list and hence, i would like to update it in a different Jira. {code:java} diffSetIndexList does not seem useful since it is the same as the nodes in level 1. BTW, diffSetIndexList is not updated when remove an element so that it seems a bug. I suggest removing diffSetIndexList since it can be computed if necessary.{code} diffSetIndexList is a list which maintains the indices for all the multiLevelNodes. I think diffSetIndexList should be kept as otherwise, for determining multiLevel nodes when sequence of delete happen will be cumbersome. For example, let's say we have snapshotDiff stored for a directory as follows with a skip interval of 3: s0-->s1->s2>s3>s4>s5->s6- In this case, s0 and s3 will be multiLevel nodes. Let's say s2 gets deleted followed by s3. Now locating the previous multiLevel node and next multiLevelNode might be a little cumbersome as we need to iterate through the list again and check which node is actually a multiLevel node. DiffsetIndexList will simplify the balancing the skipList in case of deletions. It will be updated accordingly when we handle deletes with HDFS-13171. {code:java} Pass INodeDirectory as a parameter in getSumForRange(..). Then, we could remove INodeDirectory dir from DirectoryDiffList{code} We need to have the INodeDirectory passed to DirectoryDiffList, as with INodeDirectory reference itself, we will be able to read the configured value of SkipInterval. This will be a part of HDFS-13173. {code:java} Let's replace getSumForRange with getMinListForRange in DiffList so that we may implement it DiffListByArrayList using subList.{code} getMinListForRange actually gives a list of childrenDiff(not DirectoryDiffs which is the basic element stored in the List). Putting this API in the diffList interface method might not make much sense. In case, this API seems suitable for DiffList interface method, we need to change the API to return list of DirectoryDiffs rather than childrenDiff here. For DiffListByArrayList (which will be used to store FileDiffs in general), does not have childrenDiff element. > Implement SnapshotSkipList class to store Multi level DirectoryDiffs > > > Key: HDFS-13102 > URL: https://issues.apache.org/jira/browse/HDFS-13102 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, > HDFS-13102.003.patch > > > HDFS-11225 explains an issue where deletion of older snapshots can take a > very long time in case the no of snapshot diffs is quite large for > directories. For any directory under a snapshot, to construct the children > list , it needs to combine all the diffs from that particular snapshot to the > last snapshotDiff record and reverseApply to the current children list of the > directory on live fs. This can take a significant time if the no of snapshot > diffs are quite large and changes per diff is significant. > This Jira proposes to store the Directory diffs in a SnapshotSkip list, where > we store multi level DirectoryDiffs. At each level, the Directory Diff will > be cumulative diff of k snapshot diffs, > where k is the level of a node in the list. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13170) Port webhdfs unmaskedpermission parameter to HTTPFS
[ https://issues.apache.org/jira/browse/HDFS-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell updated HDFS-13170: - Affects Version/s: 3.2.0 Status: Patch Available (was: Open) > Port webhdfs unmaskedpermission parameter to HTTPFS > --- > > Key: HDFS-13170 > URL: https://issues.apache.org/jira/browse/HDFS-13170 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-13170.001.patch > > > HDFS-6962 fixed a long standing issue where default ACLs are not correctly > applied to files when they are created from the hadoop shell. > With this change, if you create a file with default ACLs against the parent > directory, with dfs.namenode.posix.acl.inheritance.enabled=false, the result > is: > {code} > # file: /test_acl/file_from_shell_off > # owner: user1 > # group: supergroup > user::rw- > user:user1:rwx #effective:r-- > user:user2:rwx #effective:r-- > group::r-x #effective:r-- > group:users:rwx #effective:r-- > mask::r-- > other::r-- > {code} > And if you enable this, to fix the bug above, the result is as you would > expect: > {code} > # file: /test_acl/file_from_shell > # owner: user1 > # group: supergroup > user::rw- > user:user1:rwx #effective:rw- > user:user2:rwx #effective:rw- > group::r-x #effective:r-- > group:users:rwx #effective:rw- > mask::rw- > other::r-- > {code} > If I then create a file over HTTPFS or webHDFS, the behaviour is not the same > as above: > {code} > # file: /test_acl/default_permissions > # owner: user1 > # group: supergroup > user::rwx > user:user1:rwx #effective:r-x > user:user2:rwx #effective:r-x > group::r-x > group:users:rwx #effective:r-x > mask::r-x > other::r-x > {code} > Notice the mask is set to r-x and this remove the write permission on the new > file. > As part of HDFS-6962 a new parameter was added to webhdfs > 'unmaskedpermission'. By passing it to a webhdfs call, it can result in the > same behaviour as when a file is written from the CLI: > {code} > curl -i -X PUT -T test.txt --header "Content-Type:application/octet-stream" > "http://namenode:50075/webhdfs/v1/test_acl/unmasked__770?op=CREATE=user1=namenode:8020=false=770; > # file: /test_acl/unmasked__770 > # owner: user1 > # group: supergroup > user::rwx > user:user1:rwx > user:user2:rwx > group::r-x > group:users:rwx > mask::rwx > other::--- > {code} > However, this parameter was never ported to HTTPFS. > This Jira is to replicate the same changes to HTTPFS so this parameter is > available there too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370287#comment-16370287 ] Erik Krogen commented on HDFS-11187: Hi [~gabor.bota] / [~xiaochen], thanks for the work! I see that the target version is 2.7.6 but that this was only backported to branch-2.8. Do you plan to put it in branch-2.7? It seems it should go there given that IIUC HDFS-11160 introduced a performance regression in 2.7. > Optimize disk access for last partial chunk checksum of Finalized replica > - > > Key: HDFS-11187 > URL: https://issues.apache.org/jira/browse/HDFS-11187 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2 > > Attachments: HDFS-11187-branch-2.001.patch, > HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, > HDFS-11187-branch-2.004.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, > HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch > > > The patch at HDFS-11160 ensures BlockSender reads the correct version of > metafile when there are concurrent writers. > However, the implementation is not optimal, because it must always read the > last partial chunk checksum from disk while holding FsDatasetImpl lock for > every reader. It is possible to optimize this by keeping an up-to-date > version of last partial checksum in-memory and reduce disk access. > I am separating the optimization into a new jira, because maintaining the > state of in-memory checksum requires a lot more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13167) DatanodeAdminManager Improvements
[ https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370285#comment-16370285 ] BELUGA BEHR commented on HDFS-13167: Test failures are unrelated. I have uploaded a new patch to address the one check-style error. > DatanodeAdminManager Improvements > - > > Key: HDFS-13167 > URL: https://issues.apache.org/jira/browse/HDFS-13167 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch, > HDFS-13167.3.patch > > > # Use Collection type Set instead of List for tracking nodes > # Fix logging statements that are erroneously appending variables instead of > using parameters > # Miscellaneous small improvements > As an example, the {{node}} variable is being appended to the string instead > of being passed as an argument to the {{trace}} method for variable > substitution. > {code} > LOG.trace("stopDecommission: Node {} in {}, nothing to do." + > node, node.getAdminState()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13167) DatanodeAdminManager Improvements
[ https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-13167: --- Attachment: HDFS-13167.3.patch > DatanodeAdminManager Improvements > - > > Key: HDFS-13167 > URL: https://issues.apache.org/jira/browse/HDFS-13167 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch, > HDFS-13167.3.patch > > > # Use Collection type Set instead of List for tracking nodes > # Fix logging statements that are erroneously appending variables instead of > using parameters > # Miscellaneous small improvements > As an example, the {{node}} variable is being appended to the string instead > of being passed as an argument to the {{trace}} method for variable > substitution. > {code} > LOG.trace("stopDecommission: Node {} in {}, nothing to do." + > node, node.getAdminState()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13167) DatanodeAdminManager Improvements
[ https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-13167: --- Status: Patch Available (was: Open) > DatanodeAdminManager Improvements > - > > Key: HDFS-13167 > URL: https://issues.apache.org/jira/browse/HDFS-13167 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch, > HDFS-13167.3.patch > > > # Use Collection type Set instead of List for tracking nodes > # Fix logging statements that are erroneously appending variables instead of > using parameters > # Miscellaneous small improvements > As an example, the {{node}} variable is being appended to the string instead > of being passed as an argument to the {{trace}} method for variable > substitution. > {code} > LOG.trace("stopDecommission: Node {} in {}, nothing to do." + > node, node.getAdminState()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13167) DatanodeAdminManager Improvements
[ https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-13167: --- Status: Open (was: Patch Available) > DatanodeAdminManager Improvements > - > > Key: HDFS-13167 > URL: https://issues.apache.org/jira/browse/HDFS-13167 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch > > > # Use Collection type Set instead of List for tracking nodes > # Fix logging statements that are erroneously appending variables instead of > using parameters > # Miscellaneous small improvements > As an example, the {{node}} variable is being appended to the string instead > of being passed as an argument to the {{trace}} method for variable > substitution. > {code} > LOG.trace("stopDecommission: Node {} in {}, nothing to do." + > node, node.getAdminState()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList
[ https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370271#comment-16370271 ] BELUGA BEHR commented on HDFS-13168: [~elgoiri] Using 'char' type is intentional. It's faster to add a 'char' to a StringBuilder than a string. Adding a string requires a _null_ check and a _length_ check. For a char, it cannot be null and it can only be a length of 1, so it's faster. Please consider this patch for inclusion into the project. Thanks! > XmlImageVisitor - Prefer Array over LinkedList > -- > > Key: HDFS-13168 > URL: https://issues.apache.org/jira/browse/HDFS-13168 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch > > > {{ArrayDeque}} > {quote}This class is likely to be faster than Stack when used as a stack, and > faster than LinkedList when used as a queue.{quote} > .. not to mention less memory fragmentation (single backing array v.s. many > ArrayList nodes). > https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13170) Port webhdfs unmaskedpermission parameter to HTTPFS
[ https://issues.apache.org/jira/browse/HDFS-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370249#comment-16370249 ] Stephen O'Donnell commented on HDFS-13170: -- I have added a patch for this and a couple of tests. I have tried as much as possible to mirror the changes that were done in webhdfs, but the HTTPFS code is quite different so its not a straight copy and paste. I think the only code paths affected here are CREATE and MKDIRS. None of the other operations create objects that need the permissions applied in HDFS. > Port webhdfs unmaskedpermission parameter to HTTPFS > --- > > Key: HDFS-13170 > URL: https://issues.apache.org/jira/browse/HDFS-13170 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-13170.001.patch > > > HDFS-6962 fixed a long standing issue where default ACLs are not correctly > applied to files when they are created from the hadoop shell. > With this change, if you create a file with default ACLs against the parent > directory, with dfs.namenode.posix.acl.inheritance.enabled=false, the result > is: > {code} > # file: /test_acl/file_from_shell_off > # owner: user1 > # group: supergroup > user::rw- > user:user1:rwx #effective:r-- > user:user2:rwx #effective:r-- > group::r-x #effective:r-- > group:users:rwx #effective:r-- > mask::r-- > other::r-- > {code} > And if you enable this, to fix the bug above, the result is as you would > expect: > {code} > # file: /test_acl/file_from_shell > # owner: user1 > # group: supergroup > user::rw- > user:user1:rwx #effective:rw- > user:user2:rwx #effective:rw- > group::r-x #effective:r-- > group:users:rwx #effective:rw- > mask::rw- > other::r-- > {code} > If I then create a file over HTTPFS or webHDFS, the behaviour is not the same > as above: > {code} > # file: /test_acl/default_permissions > # owner: user1 > # group: supergroup > user::rwx > user:user1:rwx #effective:r-x > user:user2:rwx #effective:r-x > group::r-x > group:users:rwx #effective:r-x > mask::r-x > other::r-x > {code} > Notice the mask is set to r-x and this remove the write permission on the new > file. > As part of HDFS-6962 a new parameter was added to webhdfs > 'unmaskedpermission'. By passing it to a webhdfs call, it can result in the > same behaviour as when a file is written from the CLI: > {code} > curl -i -X PUT -T test.txt --header "Content-Type:application/octet-stream" > "http://namenode:50075/webhdfs/v1/test_acl/unmasked__770?op=CREATE=user1=namenode:8020=false=770; > # file: /test_acl/unmasked__770 > # owner: user1 > # group: supergroup > user::rwx > user:user1:rwx > user:user2:rwx > group::r-x > group:users:rwx > mask::rwx > other::--- > {code} > However, this parameter was never ported to HTTPFS. > This Jira is to replicate the same changes to HTTPFS so this parameter is > available there too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13170) Port webhdfs unmaskedpermission parameter to HTTPFS
[ https://issues.apache.org/jira/browse/HDFS-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell updated HDFS-13170: - Attachment: HDFS-13170.001.patch > Port webhdfs unmaskedpermission parameter to HTTPFS > --- > > Key: HDFS-13170 > URL: https://issues.apache.org/jira/browse/HDFS-13170 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-13170.001.patch > > > HDFS-6962 fixed a long standing issue where default ACLs are not correctly > applied to files when they are created from the hadoop shell. > With this change, if you create a file with default ACLs against the parent > directory, with dfs.namenode.posix.acl.inheritance.enabled=false, the result > is: > {code} > # file: /test_acl/file_from_shell_off > # owner: user1 > # group: supergroup > user::rw- > user:user1:rwx #effective:r-- > user:user2:rwx #effective:r-- > group::r-x #effective:r-- > group:users:rwx #effective:r-- > mask::r-- > other::r-- > {code} > And if you enable this, to fix the bug above, the result is as you would > expect: > {code} > # file: /test_acl/file_from_shell > # owner: user1 > # group: supergroup > user::rw- > user:user1:rwx #effective:rw- > user:user2:rwx #effective:rw- > group::r-x #effective:r-- > group:users:rwx #effective:rw- > mask::rw- > other::r-- > {code} > If I then create a file over HTTPFS or webHDFS, the behaviour is not the same > as above: > {code} > # file: /test_acl/default_permissions > # owner: user1 > # group: supergroup > user::rwx > user:user1:rwx #effective:r-x > user:user2:rwx #effective:r-x > group::r-x > group:users:rwx #effective:r-x > mask::r-x > other::r-x > {code} > Notice the mask is set to r-x and this remove the write permission on the new > file. > As part of HDFS-6962 a new parameter was added to webhdfs > 'unmaskedpermission'. By passing it to a webhdfs call, it can result in the > same behaviour as when a file is written from the CLI: > {code} > curl -i -X PUT -T test.txt --header "Content-Type:application/octet-stream" > "http://namenode:50075/webhdfs/v1/test_acl/unmasked__770?op=CREATE=user1=namenode:8020=false=770; > # file: /test_acl/unmasked__770 > # owner: user1 > # group: supergroup > user::rwx > user:user1:rwx > user:user2:rwx > group::r-x > group:users:rwx > mask::rwx > other::--- > {code} > However, this parameter was never ported to HTTPFS. > This Jira is to replicate the same changes to HTTPFS so this parameter is > available there too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs
[ https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370199#comment-16370199 ] Tsz Wo Nicholas Sze commented on HDFS-13102: Thanks [~shashikant] for working on this. Some comments on the patch: - Pass INodeDirectory as a parameter in getSumForRange(..). Then, we could remove INodeDirectory dir from DirectoryDiffList. - Let's replace getSumForRange with getMinListForRange in DiffList so that we may implement it DiffListByArrayList using subList. - diffSetIndexList does not seem useful since it is the same as the nodes in level 1. BTW, diffSetIndexList is not updated when remove an element so that it seems a bug. I suggest removing diffSetIndexList since it can be computed if necessary. - TestDirectoryDiffList does not test remove(..). As mentioned, remove(..) seems having some bugs. > Implement SnapshotSkipList class to store Multi level DirectoryDiffs > > > Key: HDFS-13102 > URL: https://issues.apache.org/jira/browse/HDFS-13102 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, > HDFS-13102.003.patch > > > HDFS-11225 explains an issue where deletion of older snapshots can take a > very long time in case the no of snapshot diffs is quite large for > directories. For any directory under a snapshot, to construct the children > list , it needs to combine all the diffs from that particular snapshot to the > last snapshotDiff record and reverseApply to the current children list of the > directory on live fs. This can take a significant time if the no of snapshot > diffs are quite large and changes per diff is significant. > This Jira proposes to store the Directory diffs in a SnapshotSkip list, where > we store multi level DirectoryDiffs. At each level, the Directory Diff will > be cumulative diff of k snapshot diffs, > where k is the level of a node in the list. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13169) Ambari UI deploy fails during startup of Ambari Metrics
[ https://issues.apache.org/jira/browse/HDFS-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee resolved HDFS-13169. --- Resolution: Invalid > Ambari UI deploy fails during startup of Ambari Metrics > --- > > Key: HDFS-13169 > URL: https://issues.apache.org/jira/browse/HDFS-13169 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Aravindan Vijayan >Priority: Major > > {noformat} > HDP version:HDP-3.0.0.0-702 > Ambari version: 2.99.99.0-77 > {noformat} > /var/lib/ambari-agent/data/errors-52.txt: > {noformat} > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", > line 90, in > AmsCollector().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 371, in execute > self.execute_prefix_function(self.command_name, 'post', env) > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 392, in execute_prefix_function > method(env) > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 434, in post_start > raise Fail("Pid file {0} doesn't exist after starting of the > component.".format(pid_file)) > resource_management.core.exceptions.Fail: Pid file > /var/run/ambari-metrics-collector//hbase-ams-master.pid doesn't exist after > starting of the component. > {noformat} > /var/lib/ambari-agent/data/output-52.txt: > {noformat} > 2018-01-11 13:03:40,753 - Stack Feature Version Info: Cluster Stack=3.0, > Command Stack=None, Command Version=3.0.0.0-702 -> 3.0.0.0-702 > 2018-01-11 13:03:40,755 - Using hadoop conf dir: > /usr/hdp/3.0.0.0-702/hadoop/conf > 2018-01-11 13:03:40,884 - Stack Feature Version Info: Cluster Stack=3.0, > Command Stack=None, Command Version=3.0.0.0-702 -> 3.0.0.0-702 > 2018-01-11 13:03:40,885 - Using hadoop conf dir: > /usr/hdp/3.0.0.0-702/hadoop/conf > 2018-01-11 13:03:40,886 - Group['hdfs'] {} > 2018-01-11 13:03:40,887 - Group['hadoop'] {} > 2018-01-11 13:03:40,887 - Group['users'] {} > 2018-01-11 13:03:40,887 - User['hive'] {'gid': 'hadoop', > 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} > 2018-01-11 13:03:40,890 - User['infra-solr'] {'gid': 'hadoop', > 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} > 2018-01-11 13:03:40,891 - User['zookeeper'] {'gid': 'hadoop', > 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} > 2018-01-11 13:03:40,892 - User['atlas'] {'gid': 'hadoop', > 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} > 2018-01-11 13:03:40,893 - User['ams'] {'gid': 'hadoop', > 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} > 2018-01-11 13:03:40,893 - User['ambari-qa'] {'gid': 'hadoop', > 'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'users'], 'uid': None} > 2018-01-11 13:03:40,894 - User['kafka'] {'gid': 'hadoop', > 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} > 2018-01-11 13:03:40,894 - User['tez'] {'gid': 'hadoop', > 'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'users'], 'uid': None} > 2018-01-11 13:03:40,895 - User['hdfs'] {'gid': 'hadoop', > 'fetch_nonlocal_groups': True, 'groups': ['hdfs', 'hadoop'], 'uid': None} > 2018-01-11 13:03:40,895 - User['yarn'] {'gid': 'hadoop', > 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} > 2018-01-11 13:03:40,896 - User['mapred'] {'gid': 'hadoop', > 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} > 2018-01-11 13:03:40,897 - User['hbase'] {'gid': 'hadoop', > 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} > 2018-01-11 13:03:40,897 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] > {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} > 2018-01-11 13:03:40,898 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh > ambari-qa > /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa > 0'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'} > 2018-01-11 13:03:40,903 - Skipping > Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa > /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa > 0'] due to not_if > 2018-01-11 13:03:40,903 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', > 'create_parents': True, 'mode': 0775, 'cd_access': 'a'} > 2018-01-11 13:03:40,904 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] > {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} > 2018-01-11 13:03:40,905 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] > {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} > 2018-01-11 13:03:40,906 -
[jira] [Commented] (HDFS-12070) Failed block recovery leaves files open indefinitely and at risk for data loss
[ https://issues.apache.org/jira/browse/HDFS-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370138#comment-16370138 ] Daryn Sharp commented on HDFS-12070: Back when I filed, I played around with a fix and didn't use close=false. I too read the append design. It reads is if the PD is supposed to obtain a new genstamp and retry but I don't think a DN can do that. The reasoning for another round of commit sync wasn't explained. Perhaps it was due to the earlier implementation or concerns over concurrent commit syncs but the recovery id feature should allow the NN to weed out prior commit syncs. My concern is the NN has claimed the lease during commit sync. Append, truncate, and non-overwrite creates will trigger an implicit commit sync. Normally it completes almost immediately, roughly up to the heartbeat interval, and the client succeeds on retry. If another round of commit sync is required due to close=false, the client can re-trigger commit sync after the soft lease period (5 mins) – I don't think a client does or should retry for that long. Which means the operation will unnecessarily fail. Also, it will take up to the hard lease period (1 hour) for the NN to fix the under replication. In either case (close=true/false), the NN has removed the failed DNs from the expected locations. Bad blocks should be invalidated if/when "failed" DNs block report in the wrong genstamp and/or size so I think it's safe for the PD to ignore failed nodes and close? > Failed block recovery leaves files open indefinitely and at risk for data loss > -- > > Key: HDFS-12070 > URL: https://issues.apache.org/jira/browse/HDFS-12070 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Kihwal Lee >Priority: Major > Attachments: HDFS-12070.0.patch, lease.patch > > > Files will remain open indefinitely if block recovery fails which creates a > high risk of data loss. The replication monitor will not replicate these > blocks. > The NN provides the primary node a list of candidate nodes for recovery which > involves a 2-stage process. The primary node removes any candidates that > cannot init replica recovery (essentially alive and knows about the block) to > create a sync list. Stage 2 issues updates to the sync list – _but fails if > any node fails_ unlike the first stage. The NN should be informed of nodes > that did succeed. > Manual recovery will also fail until the problematic node is temporarily > stopped so a connection refused will induce the bad node to be pruned from > the candidates. Recovery succeeds, the lease is released, under replication > is fixed, and block is invalidated from the bad node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13113) Use Log.*(Object, Throwable) overload to log exceptions
[ https://issues.apache.org/jira/browse/HDFS-13113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Bokor updated HDFS-13113: Attachment: HADOOP-10571-branch-3.0.002.patch > Use Log.*(Object, Throwable) overload to log exceptions > --- > > Key: HDFS-13113 > URL: https://issues.apache.org/jira/browse/HDFS-13113 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, nfs >Affects Versions: 2.4.0 >Reporter: Steve Loughran >Assignee: Andras Bokor >Priority: Major > Attachments: HADOOP-10571-branch-3.0.002.patch, > HADOOP-10571.05.patch, HADOOP-10571.07.patch > > > FYI, In HADOOP-10571, [~boky01] is going to clean up a lot of the log > statements, including some in Datanode and elsewhere. > I'm provisionally +1 on that, but want to run it on the standalone tests > (Yetus has already done them), and give the HDFS developers warning of a > change which is going to touch their codebase. > If anyone doesn't want the logging improvements, now is your chance to say so -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Fajth reassigned HDFS-13174: --- Assignee: Istvan Fajth > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Fix For: 3.1.0, 3.0.1, 2.8.4, 2.7.6 > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, after 20 minutes Mover will stop with the > following exception reported to the console (lines might differ as this > exception came from a CDH5.12.1 installation): > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13174) hdfs mover -p /path times out after 20 min
Istvan Fajth created HDFS-13174: --- Summary: hdfs mover -p /path times out after 20 min Key: HDFS-13174 URL: https://issues.apache.org/jira/browse/HDFS-13174 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 3.0.0-alpha2, 2.7.4, 2.8.0 Reporter: Istvan Fajth Fix For: 3.1.0, 3.0.1, 2.8.4, 2.7.6 In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source class, that is checked during dispatching the moves that the Balancer and the Mover does. This timeout is hardwired to 20 minutes. In the Balancer we have iterations, and even if an iteration is timing out the Balancer runs further and does an other iteration before it fails if there were no moves happened in a few iterations. The Mover on the other hand does not have iterations, so if moving a path runs for more than 20 minutes, after 20 minutes Mover will stop with the following exception reported to the console (lines might differ as this exception came from a CDH5.12.1 installation): java.io.IOException: Block move timed out at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) at org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org