[jira] [Commented] (HDFS-13443) RBF: Update mount table cache immediately after changing (add/update/remove) mount table entries.
[ https://issues.apache.org/jira/browse/HDFS-13443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461959#comment-16461959 ] genericqa commented on HDFS-13443: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 16m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 57s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 49s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 37s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}190m 0s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.qjournal.server.TestJournalNodeSync | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration | | | hadoop.hdfs.client.impl.TestBlockReaderLocal | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HDFS-13443 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12921669/HDFS-13443.007.patch | | Optional Tests |
[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461956#comment-16461956 ] Chao Sun commented on HDFS-13286: - Attached patch v5. [~xkrogen]: I didn't find anything else besides those in your comments. There are several places where STANDBY is checked in the Yarn RM, but I think we should not need to change them as it is not possible to get into OBSERVER state in the RM. > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286-HDFS-12943.000.patch, > HDFS-13286-HDFS-12943.001.patch, HDFS-13286-HDFS-12943.002.patch, > HDFS-13286-HDFS-12943.003.patch, HDFS-13286-HDFS-12943.004.patch, > HDFS-13286-HDFS-12943.005.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-13286: Attachment: HDFS-13286-HDFS-12943.005.patch > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286-HDFS-12943.000.patch, > HDFS-13286-HDFS-12943.001.patch, HDFS-13286-HDFS-12943.002.patch, > HDFS-13286-HDFS-12943.003.patch, HDFS-13286-HDFS-12943.004.patch, > HDFS-13286-HDFS-12943.005.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-13286: Attachment: (was: HDFS-13286-HDFS-12943.005.patch) > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286-HDFS-12943.000.patch, > HDFS-13286-HDFS-12943.001.patch, HDFS-13286-HDFS-12943.002.patch, > HDFS-13286-HDFS-12943.003.patch, HDFS-13286-HDFS-12943.004.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-13286: Attachment: HDFS-13286-HDFS-12943.005.patch > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286-HDFS-12943.000.patch, > HDFS-13286-HDFS-12943.001.patch, HDFS-13286-HDFS-12943.002.patch, > HDFS-13286-HDFS-12943.003.patch, HDFS-13286-HDFS-12943.004.patch, > HDFS-13286-HDFS-12943.005.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-16) Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition.
[ https://issues.apache.org/jira/browse/HDDS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461801#comment-16461801 ] Bharat Viswanadham edited comment on HDDS-16 at 5/3/18 4:22 AM: [~msingh] I have one question, in current ozone code, CHAINED replication type is not supported. In future, if CHAINED will be supported, I think we might need pipeline information need to be sent to datanode. The client tries to connector leader in the pipeline, then internally in datanode we might need pipeline information to connect to other data nodes and replicate. This is my understanding, not completely sure. Please correct me if I am missing something here. was (Author: bharatviswa): [~msingh] I have one question, in current ozone code, CHAINED replication type is not supported. In future, if CHAINED will be supported, I think we might need pipeline information need to be sent to datanode. Client tries to connector leader in the pipeline, then internally in datanode we might need pipeline information to connect to other data nodes and replicate. This is my understanding, not completely sure correct. Please correct me if i am missing something here. > Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition. > > > Key: HDDS-16 > URL: https://issues.apache.org/jira/browse/HDDS-16 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Native, Ozone Datanode >Affects Versions: 0.2.1 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-16.001.patch > > > The current Ozone code passes pipeline information to datanodes as well. > However datanodes do not use this information. > Hence Pipeline should be removed from ozone datanode commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13186) [PROVIDED Phase 2] Multipart Multinode uploader API + Implementations
[ https://issues.apache.org/jira/browse/HDFS-13186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461912#comment-16461912 ] Chris Douglas commented on HDFS-13186: -- Thanks for the updated patch. Minor comments still apply, but let's work out the high-level first. bq. MPU is distinct from serial copy since we can upload the data out of order. I don't see the use case for a version that breaks this. In this case, I think the boiler plate is correct: try to make an MPU; if we can't then fall back to FileSystem::write Fair enough. If the caller controls the partitioning, then it can also decide whether the copy should be serial or parallel (e.g., for small files). However, this also requires the caller to implement recovery paths for both cases. So a user of this library would need to persist their {{UploadHandle}} to abort the call for a MPU, and bespoke metadata to recover failed, serial copies. That's not unreasonable, but it means that the client will probably fall back either to copy-in-place/renaming to promote completed output, which (as we've seen with S3/WASB and HDFS) doesn't have a good, default answer. If the MPU were just a U, then this library could also handle promotion in a FS-efficient way. All that said, if this is not intended as a small framework, but rather a common API to parallel upload machinery across S3/WASB and HDFS, then this is perfectly fine. To be a general framework, this would also need a partitioning strategy (with overrides) to generate splits, which resolve to {{PartHandle}} objects. Probably not worth it. bq. So the numbering of the parts would be internal to the part handle when downcasting in the different types. This means we could no longer rely on the fact that it's just a String and would then have to introduce all the protobuf machinery. That's possible; just wanted to highlight it before I go away and implement that. bq. the filePath needs to follow the upload around because various nodes that are calling put part will need to know what type of MultipartUploader to make bq. If we need to serialize anything, I think we should have it be explicitly done through protobuf As with {{PathHandle}} implementations, PB might be useful, but not required. But your point is well taken. Since this would be passed across multiple nodes that may include different versions of the impl's MPU on the classpath, that could create some unnecessarily complicated logic for the MPU implementer. Let's stick with the approach in v004. > [PROVIDED Phase 2] Multipart Multinode uploader API + Implementations > - > > Key: HDFS-13186 > URL: https://issues.apache.org/jira/browse/HDFS-13186 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ewan Higgs >Assignee: Ewan Higgs >Priority: Major > Attachments: HDFS-13186.001.patch, HDFS-13186.002.patch, > HDFS-13186.003.patch, HDFS-13186.004.patch > > > To write files in parallel to an external storage system as in HDFS-12090, > there are two approaches: > # Naive approach: use a single datanode per file that copies blocks locally > as it streams data to the external service. This requires a copy for each > block inside the HDFS system and then a copy for the block to be sent to the > external system. > # Better approach: Single point (e.g. Namenode or SPS style external client) > and Datanodes coordinate in a multipart - multinode upload. > This system needs to work with multiple back ends and needs to coordinate > across the network. So we propose an API that resembles the following: > {code:java} > public UploadHandle multipartInit(Path filePath) throws IOException; > public PartHandle multipartPutPart(InputStream inputStream, > int partNumber, UploadHandle uploadId) throws IOException; > public void multipartComplete(Path filePath, > List> handles, > UploadHandle multipartUploadId) throws IOException;{code} > Here, UploadHandle and PartHandle are opaque handlers in the vein of > PathHandle so they can be serialized and deserialized in hadoop-hdfs project > without knowledge of how to deserialize e.g. S3A's version of a UpoadHandle > and PartHandle. > In an object store such as S3A, the implementation is straight forward. In > the case of writing multipart/multinode to HDFS, we can write each block as a > file part. The complete call will perform a concat on the blocks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13507) RBF: Remove update functionality from routeradmin's add cmd
[ https://issues.apache.org/jira/browse/HDFS-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461902#comment-16461902 ] Yiqun Lin commented on HDFS-13507: -- Agreed, feel free to go ahead, :). > RBF: Remove update functionality from routeradmin's add cmd > --- > > Key: HDFS-13507 > URL: https://issues.apache.org/jira/browse/HDFS-13507 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Wei Yan >Assignee: Gang Li >Priority: Minor > Labels: incompatible > Attachments: HDFS-13507.000.patch, HDFS-13507.001.patch, > HDFS-13507.002.patch > > > Follow up the discussion in HDFS-13326. We should remove the "update" > functionality from routeradmin's add cmd, to make it consistent with RPC > calls. > Note that: this is an incompatible change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13245) RBF: State store DBMS implementation
[ https://issues.apache.org/jira/browse/HDFS-13245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461863#comment-16461863 ] Yiqun Lin commented on HDFS-13245: -- {quote} Let's add TestStateStoreDisabledNameserviceStore in a separate JIRA. {quote} This is tracked in HDFS-13525. > RBF: State store DBMS implementation > > > Key: HDFS-13245 > URL: https://issues.apache.org/jira/browse/HDFS-13245 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: maobaolong >Assignee: Yiran Wu >Priority: Major > Attachments: HDFS-13245.001.patch, HDFS-13245.002.patch, > HDFS-13245.003.patch, HDFS-13245.004.patch, HDFS-13245.005.patch, > HDFS-13245.006.patch > > > Add a DBMS implementation for the State Store. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13525) RBF: Add unit test TestStateStoreDisabledNameserviceStore
Yiqun Lin created HDFS-13525: Summary: RBF: Add unit test TestStateStoreDisabledNameserviceStore Key: HDFS-13525 URL: https://issues.apache.org/jira/browse/HDFS-13525 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.1 Reporter: Yiqun Lin Assignee: Yiqun Lin -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13443) RBF: Update mount table cache immediately after changing (add/update/remove) mount table entries.
[ https://issues.apache.org/jira/browse/HDFS-13443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Arshad updated HDFS-13443: --- Attachment: HDFS-13443.007.patch > RBF: Update mount table cache immediately after changing (add/update/remove) > mount table entries. > - > > Key: HDFS-13443 > URL: https://issues.apache.org/jira/browse/HDFS-13443 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > Labels: RBF > Attachments: HDFS-13443-branch-2.001.patch, > HDFS-13443-branch-2.002.patch, HDFS-13443.001.patch, HDFS-13443.002.patch, > HDFS-13443.003.patch, HDFS-13443.004.patch, HDFS-13443.005.patch, > HDFS-13443.006.patch, HDFS-13443.007.patch > > > Currently mount table cache is updated periodically, by default cache is > updated every minute. After change in mount table, user operations may still > use old mount table. This is bit wrong. > To update mount table cache, maybe we can do following > * *Add refresh API in MountTableManager which will update mount table cache.* > * *When there is a change in mount table entries, router admin server can > update its cache and ask other routers to update their cache*. For example if > there are three routers R1,R2,R3 in a cluster then add mount table entry API, > at admin server side, will perform following sequence of action > ## user submit add mount table entry request on R1 > ## R1 adds the mount table entry in state store > ## R1 call refresh API on R2 > ## R1 calls refresh API on R3 > ## R1 directly freshest its cache > ## Add mount table entry response send back to user. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13443) RBF: Update mount table cache immediately after changing (add/update/remove) mount table entries.
[ https://issues.apache.org/jira/browse/HDFS-13443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461843#comment-16461843 ] Mohammad Arshad commented on HDFS-13443: Thanks [~elgoiri] for the reviews. Fixed all above comments submitting patch HDFS-13443.007.patch. To fix check style issue in RouterAdmin two local variables changed to inline argument. > RBF: Update mount table cache immediately after changing (add/update/remove) > mount table entries. > - > > Key: HDFS-13443 > URL: https://issues.apache.org/jira/browse/HDFS-13443 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > Labels: RBF > Attachments: HDFS-13443-branch-2.001.patch, > HDFS-13443-branch-2.002.patch, HDFS-13443.001.patch, HDFS-13443.002.patch, > HDFS-13443.003.patch, HDFS-13443.004.patch, HDFS-13443.005.patch, > HDFS-13443.006.patch > > > Currently mount table cache is updated periodically, by default cache is > updated every minute. After change in mount table, user operations may still > use old mount table. This is bit wrong. > To update mount table cache, maybe we can do following > * *Add refresh API in MountTableManager which will update mount table cache.* > * *When there is a change in mount table entries, router admin server can > update its cache and ask other routers to update their cache*. For example if > there are three routers R1,R2,R3 in a cluster then add mount table entry API, > at admin server side, will perform following sequence of action > ## user submit add mount table entry request on R1 > ## R1 adds the mount table entry in state store > ## R1 call refresh API on R2 > ## R1 calls refresh API on R3 > ## R1 directly freshest its cache > ## Add mount table entry response send back to user. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13524) Occasional "All datanodes are bad" error in TestLargeBlock#testLargeBlockSize
[ https://issues.apache.org/jira/browse/HDFS-13524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-13524: --- Description: TestLargeBlock#testLargeBlockSize may fail with error: {quote} All datanodes [DatanodeInfoWithStorage[127.0.0.1:44968,DS-acddd79e-cdf1-4ac5-aac5-e804a2e61600,DISK]] are bad. Aborting... {quote} Tracing back, the error is due to the stress applied to the host sending a 2GB block, causing write pipeline ack read timeout: {quote} 2017-09-10 22:16:07,285 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO datanode.DataNode (DataXceiver.java:writeBlock(742)) - Receiving BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 src: /127.0.0.1:57794 dest: /127.0.0.1:44968 2017-09-10 22:16:50,402 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] WARN datanode.DataNode (BlockReceiver.java:flushOrSync(434)) - Slow flushOrSync took 5383ms (threshold=300ms), isSync:false, flushTotalNanos=5383638982ns, volume=file:/tmp/tmp.1oS3ZfDCwq/src/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/ 2017-09-10 22:17:54,427 [ResponseProcessor for block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001] WARN hdfs.DataStreamer (DataStreamer.java:run(1214)) - Exception for BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:57794 remote=/127.0.0.1:44968] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:434) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213) at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1104) 2017-09-10 22:17:54,432 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO datanode.DataNode (BlockReceiver.java:receiveBlock(1000)) - Exception for BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 java.io.IOException: Connection reset by peer {quote} Instead of raising read timeout, I suggest increasing cluster size from default=1 to 3, so that it has the opportunity to choose a different DN and retry. Suspect this fails after HDFS-13103, in Hadoop 2.8/3.0.0-alpha1 when we introduced client acknowledgement read timeout. was: TestLargeBlock#testLargeBlockSize may fail with error: {quote} All datanodes [DatanodeInfoWithStorage[127.0.0.1:44968,DS-acddd79e-cdf1-4ac5-aac5-e804a2e61600,DISK]] are bad. Aborting... {quote} Tracing back, the error is due to the stress applied to the host sending a 2GB block, causing write pipeline ack read timeout: {quote} 2017-09-10 22:16:07,285 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO datanode.DataNode (DataXceiver.java:writeBlock(742)) - Receiving BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 src: /127.0.0.1:57794 dest: /127.0.0.1:44968 2017-09-10 22:16:50,402 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] WARN datanode.DataNode (BlockReceiver.java:flushOrSync(434)) - Slow flushOrSync took 5383ms (threshold=300ms), isSync:false, flushTotalNanos=5383638982ns, volume=file:/tmp/tmp.1oS3ZfDCwq/src/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/ 2017-09-10 22:17:54,427 [ResponseProcessor for block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001] WARN hdfs.DataStreamer (DataStreamer.java:run(1214)) - Exception for BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:57794 remote=/127.0.0.1:44968] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at
[jira] [Updated] (HDFS-13524) Occasional "All datanodes are bad" error in TestLargeBlock#testLargeBlockSize
[ https://issues.apache.org/jira/browse/HDFS-13524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-13524: --- Description: TestLargeBlock#testLargeBlockSize may fail with error: {quote} All datanodes [DatanodeInfoWithStorage[127.0.0.1:44968,DS-acddd79e-cdf1-4ac5-aac5-e804a2e61600,DISK]] are bad. Aborting... {quote} Tracing back, the error is due to the stress applied to the host sending a 2GB block, causing write pipeline ack read timeout: {quote} 2017-09-10 22:16:07,285 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO datanode.DataNode (DataXceiver.java:writeBlock(742)) - Receiving BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 src: /127.0.0.1:57794 dest: /127.0.0.1:44968 2017-09-10 22:16:50,402 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] WARN datanode.DataNode (BlockReceiver.java:flushOrSync(434)) - Slow flushOrSync took 5383ms (threshold=300ms), isSync:false, flushTotalNanos=5383638982ns, volume=file:/tmp/tmp.1oS3ZfDCwq/src/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/ 2017-09-10 22:17:54,427 [ResponseProcessor for block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001] WARN hdfs.DataStreamer (DataStreamer.java:run(1214)) - Exception for BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:57794 remote=/127.0.0.1:44968] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:434) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213) at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1104) 2017-09-10 22:17:54,432 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO datanode.DataNode (BlockReceiver.java:receiveBlock(1000)) - Exception for BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 java.io.IOException: Connection reset by peer {quote} Instead of raising read timeout, I suggest increasing cluster size from default=1 to 3, so that it has the opportunity to choose a different DN and resend. Suspect this fails after HDFS-13103, in Hadoop 2.8/3.0.0-alpha1 when we introduced client acknowledgement read timeout. > Occasional "All datanodes are bad" error in TestLargeBlock#testLargeBlockSize > - > > Key: HDFS-13524 > URL: https://issues.apache.org/jira/browse/HDFS-13524 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Priority: Major > > TestLargeBlock#testLargeBlockSize may fail with error: > {quote} > All datanodes > [DatanodeInfoWithStorage[127.0.0.1:44968,DS-acddd79e-cdf1-4ac5-aac5-e804a2e61600,DISK]] > are bad. Aborting... > {quote} > Tracing back, the error is due to the stress applied to the host sending a > 2GB block, causing write pipeline ack read timeout: > {quote} > 2017-09-10 22:16:07,285 [DataXceiver for client > DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block > BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO > datanode.DataNode (DataXceiver.java:writeBlock(742)) - Receiving > BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 src: > /127.0.0.1:57794 dest: /127.0.0.1:44968 > 2017-09-10 22:16:50,402 [DataXceiver for client > DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block > BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] WARN > datanode.DataNode (BlockReceiver.java:flushOrSync(434)) - Slow flushOrSync > took 5383ms (threshold=300ms), isSync:false, flushTotalNanos=5383638982ns, > volume=file:/tmp/tmp.1oS3ZfDCwq/src/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/ > 2017-09-10 22:17:54,427 [ResponseProcessor for block >
[jira] [Updated] (HDFS-13524) Occasional "All datanodes are bad" error in TestLargeBlock#testLargeBlockSize
[ https://issues.apache.org/jira/browse/HDFS-13524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-13524: --- Environment: (was: TestLargeBlock#testLargeBlockSize may fail with error: {quote} All datanodes [DatanodeInfoWithStorage[127.0.0.1:44968,DS-acddd79e-cdf1-4ac5-aac5-e804a2e61600,DISK]] are bad. Aborting... {quote} Tracing back, the error is due to the stress applied to the host sending a 2GB block, causing write pipeline ack read timeout: {quote} 2017-09-10 22:16:07,285 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO datanode.DataNode (DataXceiver.java:writeBlock(742)) - Receiving BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 src: /127.0.0.1:57794 dest: /127.0.0.1:44968 2017-09-10 22:16:50,402 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] WARN datanode.DataNode (BlockReceiver.java:flushOrSync(434)) - Slow flushOrSync took 5383ms (threshold=300ms), isSync:false, flushTotalNanos=5383638982ns, volume=file:/tmp/tmp.1oS3ZfDCwq/src/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/ 2017-09-10 22:17:54,427 [ResponseProcessor for block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001] WARN hdfs.DataStreamer (DataStreamer.java:run(1214)) - Exception for BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:57794 remote=/127.0.0.1:44968] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:434) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213) at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1104) 2017-09-10 22:17:54,432 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO datanode.DataNode (BlockReceiver.java:receiveBlock(1000)) - Exception for BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 java.io.IOException: Connection reset by peer {quote} Instead of raising read timeout, I suggest increasing cluster size from default=1 to 3, so that it has the opportunity to choose a different DN and resend. Suspect this fails after HDFS-13103, in Hadoop 2.8/3.0.0-alpha1 when we introduced client acknowledgement read timeout.) > Occasional "All datanodes are bad" error in TestLargeBlock#testLargeBlockSize > - > > Key: HDFS-13524 > URL: https://issues.apache.org/jira/browse/HDFS-13524 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13524) Occasional "All datanodes are bad" error in TestLargeBlock#testLargeBlockSize
[ https://issues.apache.org/jira/browse/HDFS-13524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-13524: --- Affects Version/s: 2.8.0 3.0.0-alpha1 > Occasional "All datanodes are bad" error in TestLargeBlock#testLargeBlockSize > - > > Key: HDFS-13524 > URL: https://issues.apache.org/jira/browse/HDFS-13524 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: TestLargeBlock#testLargeBlockSize may fail with error: > {quote} > All datanodes > [DatanodeInfoWithStorage[127.0.0.1:44968,DS-acddd79e-cdf1-4ac5-aac5-e804a2e61600,DISK]] > are bad. Aborting... > {quote} > Tracing back, the error is due to the stress applied to the host sending a > 2GB block, causing write pipeline ack read timeout: > {quote} > 2017-09-10 22:16:07,285 [DataXceiver for client > DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block > BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO > datanode.DataNode (DataXceiver.java:writeBlock(742)) - Receiving > BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 src: > /127.0.0.1:57794 dest: /127.0.0.1:44968 > 2017-09-10 22:16:50,402 [DataXceiver for client > DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block > BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] WARN > datanode.DataNode (BlockReceiver.java:flushOrSync(434)) - Slow flushOrSync > took 5383ms (threshold=300ms), isSync:false, flushTotalNanos=5383638982ns, > volume=file:/tmp/tmp.1oS3ZfDCwq/src/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/ > 2017-09-10 22:17:54,427 [ResponseProcessor for block > BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001] WARN > hdfs.DataStreamer (DataStreamer.java:run(1214)) - Exception for > BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 > java.net.SocketTimeoutException: 65000 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/127.0.0.1:57794 remote=/127.0.0.1:44968] > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:434) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213) > at > org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1104) > 2017-09-10 22:17:54,432 [DataXceiver for client > DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block > BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO > datanode.DataNode (BlockReceiver.java:receiveBlock(1000)) - Exception for > BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 > java.io.IOException: Connection reset by peer > {quote} > Instead of raising read timeout, I suggest increasing cluster size from > default=1 to 3, so that it has the opportunity to choose a different DN and > resend. > Suspect this fails after HDFS-13103, in Hadoop 2.8/3.0.0-alpha1 when we > introduced client acknowledgement read timeout. >Reporter: Wei-Chiu Chuang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13524) Occasional "All datanodes are bad" error in TestLargeBlock#testLargeBlockSize
Wei-Chiu Chuang created HDFS-13524: -- Summary: Occasional "All datanodes are bad" error in TestLargeBlock#testLargeBlockSize Key: HDFS-13524 URL: https://issues.apache.org/jira/browse/HDFS-13524 Project: Hadoop HDFS Issue Type: Bug Environment: TestLargeBlock#testLargeBlockSize may fail with error: {quote} All datanodes [DatanodeInfoWithStorage[127.0.0.1:44968,DS-acddd79e-cdf1-4ac5-aac5-e804a2e61600,DISK]] are bad. Aborting... {quote} Tracing back, the error is due to the stress applied to the host sending a 2GB block, causing write pipeline ack read timeout: {quote} 2017-09-10 22:16:07,285 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO datanode.DataNode (DataXceiver.java:writeBlock(742)) - Receiving BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 src: /127.0.0.1:57794 dest: /127.0.0.1:44968 2017-09-10 22:16:50,402 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] WARN datanode.DataNode (BlockReceiver.java:flushOrSync(434)) - Slow flushOrSync took 5383ms (threshold=300ms), isSync:false, flushTotalNanos=5383638982ns, volume=file:/tmp/tmp.1oS3ZfDCwq/src/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/ 2017-09-10 22:17:54,427 [ResponseProcessor for block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001] WARN hdfs.DataStreamer (DataStreamer.java:run(1214)) - Exception for BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:57794 remote=/127.0.0.1:44968] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:434) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213) at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1104) 2017-09-10 22:17:54,432 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO datanode.DataNode (BlockReceiver.java:receiveBlock(1000)) - Exception for BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 java.io.IOException: Connection reset by peer {quote} Instead of raising read timeout, I suggest increasing cluster size from default=1 to 3, so that it has the opportunity to choose a different DN and resend. Suspect this fails after HDFS-13103, in Hadoop 2.8/3.0.0-alpha1 when we introduced client acknowledgement read timeout. Reporter: Wei-Chiu Chuang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13081) Datanode#checkSecureConfig should allow SASL and privileged HTTP
[ https://issues.apache.org/jira/browse/HDFS-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461812#comment-16461812 ] Anu Engineer commented on HDFS-13081: - There is a lot of existing HDFS clusters where wildcard certs are used. :( For example, some vendors document the use of Wild Card Certs. I am concerned that this patch does not consider that scenario, which is quite popular in the wild and opens up lots of existing cluster to new security threats. > Datanode#checkSecureConfig should allow SASL and privileged HTTP > > > Key: HDFS-13081 > URL: https://issues.apache.org/jira/browse/HDFS-13081 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, security >Affects Versions: 3.0.0 >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Fix For: 3.1.0, 3.0.3 > > Attachments: HDFS-13081.000.patch, HDFS-13081.001.patch, > HDFS-13081.002.patch, HDFS-13081.003.patch, HDFS-13081.004.patch, > HDFS-13081.005.patch, HDFS-13081.006.patch > > > Datanode#checkSecureConfig currently check the following to determine if > secure datanode is enabled. > # The server has bound to privileged ports for RPC and HTTP via > SecureDataNodeStarter. > # The configuration enables SASL on DataTransferProtocol and HTTPS (no plain > HTTP) for the HTTP server. > Authentication of Datanode RPC server can be done either via SASL handshake > or JSVC/privilege RPC port. > This guarantees authentication of the datanode RPC server before a client > transmits a secret, such as a block access token. > Authentication of the HTTP server can also be done either via HTTPS/SSL or > JSVC/privilege HTTP port. This guarantees authentication of datandoe HTTP > server before a client transmits a secret, such as a delegation token. > This ticket is open to allow privileged HTTP as an alternative to HTTPS to > work with SASL based RPC protection. > > cc: [~cnauroth] , [~daryn], [~jnpandey] for additional feedback. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-16) Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition.
[ https://issues.apache.org/jira/browse/HDDS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461801#comment-16461801 ] Bharat Viswanadham commented on HDDS-16: [~msingh] I have one question, in current ozone code, CHAINED replication type is not supported. In future, if CHAINED will be supported, I think we might need pipeline information need to be sent to datanode. Client tries to connector leader in the pipeline, then internally in datanode we might need pipeline information to connect to other data nodes and replicate. This is my understanding, not completely sure correct. Please correct me if i am missing something here. > Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition. > > > Key: HDDS-16 > URL: https://issues.apache.org/jira/browse/HDDS-16 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Native, Ozone Datanode >Affects Versions: 0.2.1 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-16.001.patch > > > The current Ozone code passes pipeline information to datanodes as well. > However datanodes do not use this information. > Hence Pipeline should be removed from ozone datanode commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13481) TestRollingFileSystemSinkWithHdfs#testFlushThread: test failed intermittently
[ https://issues.apache.org/jira/browse/HDFS-13481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461788#comment-16461788 ] Hudson commented on HDFS-13481: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14115 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14115/]) HDFS-13481. TestRollingFileSystemSinkWithHdfs#testFlushThread: test (templedf: rev 87c23ef643393c39e8353ca9f495b0c8f97cdbd9) * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/sink/RollingFileSystemSinkTestBase.java > TestRollingFileSystemSinkWithHdfs#testFlushThread: test failed intermittently > - > > Key: HDFS-13481 > URL: https://issues.apache.org/jira/browse/HDFS-13481 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.3 > > Attachments: HDFS-13481.001.patch > > > The test fails very rarely on a laptop, but very commonly during Jenkins runs. > {noformat} > Error Message > Flush thread did not run within 10 seconds > Stacktrace > java.lang.AssertionError: Flush thread did not run within 10 seconds > at > org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs.testFlushThread(TestRollingFileSystemSinkWithHdfs.java:291) > {noformat} > According to my test, this breaks about 0.3% locally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13481) TestRollingFileSystemSinkWithHdfs#testFlushThread: test failed intermittently
[ https://issues.apache.org/jira/browse/HDFS-13481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-13481: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.3 3.1.1 3.2.0 Status: Resolved (was: Patch Available) Thanks for the patch, [~gabor.bota]. Committed to trunk, branch-3.1 and branch-3.0. I've lost track of the branch structure lately. Anywhere else I need to commit it? > TestRollingFileSystemSinkWithHdfs#testFlushThread: test failed intermittently > - > > Key: HDFS-13481 > URL: https://issues.apache.org/jira/browse/HDFS-13481 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.3 > > Attachments: HDFS-13481.001.patch > > > The test fails very rarely on a laptop, but very commonly during Jenkins runs. > {noformat} > Error Message > Flush thread did not run within 10 seconds > Stacktrace > java.lang.AssertionError: Flush thread did not run within 10 seconds > at > org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs.testFlushThread(TestRollingFileSystemSinkWithHdfs.java:291) > {noformat} > According to my test, this breaks about 0.3% locally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461766#comment-16461766 ] Erik Krogen commented on HDFS-13522: Thanks for your helpful comments [~elgoiri]! {quote} It would be handy to have a setup of the MiniDFSCluster that sets OBSERVER NNs automatically and having some predefined contract that makes sure that requests are going to the OBSERVER. Is there something along those lines already available in HDFS-12943? {quote} Not yet but we should definitely add that. Created HDFS-13523 Just FYI, no one is planning on working on this ticket soon, just created to make sure it is not forgotten. > Support observer node from Router-Based Federation > -- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Priority: Major > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13523) Support observer nodes in MiniDFSCluster
Erik Krogen created HDFS-13523: -- Summary: Support observer nodes in MiniDFSCluster Key: HDFS-13523 URL: https://issues.apache.org/jira/browse/HDFS-13523 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, test Reporter: Erik Krogen MiniDFSCluster should support Observer nodes so that we can write decent integration tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461763#comment-16461763 ] Íñigo Goiri commented on HDFS-13522: There are two main changes to do: * Collect the OBSERVER state in {{NamenodeHeartbeatService}} and store it in the {{MembershipStore}} through the {{FederationNamenodeServiceState}} * Allow the {{RouterRpcClient}} to pick OBSERVER NNs to perform the operations. Both changes should be fairly easy to add. It would be handy to have a setup of the MiniDFSCluster that sets OBSERVER NNs automatically and having some predefined contract that makes sure that requests are going to the OBSERVER. Is there something along those lines already available in HDFS-12943? > Support observer node from Router-Based Federation > -- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Priority: Major > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461761#comment-16461761 ] Chao Sun commented on HDFS-13286: - Yes that makes sense. Will update the error message. > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286-HDFS-12943.000.patch, > HDFS-13286-HDFS-12943.001.patch, HDFS-13286-HDFS-12943.002.patch, > HDFS-13286-HDFS-12943.003.patch, HDFS-13286-HDFS-12943.004.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461756#comment-16461756 ] Erik Krogen commented on HDFS-13286: For {{FailoverController#preFailoverChecks()}}, I agree with your logic, but we need to update the error message here: {code} if (!toSvcStatus.getState().equals(HAServiceState.STANDBY)) { throw new FailoverFailedException( "Can't failover to an active service"); } {code} Currently if you tried to failover to OBSERVER, you would get the message that you can't failover to an active service. We should update to "Can't failover to a service not in standby state" or something like that. Agreed with leaving {{FederationNamenodeServiceState}} for a follow-on JIRA targeted at RBF support. Created HDFS-13522. > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286-HDFS-12943.000.patch, > HDFS-13286-HDFS-12943.001.patch, HDFS-13286-HDFS-12943.002.patch, > HDFS-13286-HDFS-12943.003.patch, HDFS-13286-HDFS-12943.004.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13522) Support observer node from Router-Based Federation
Erik Krogen created HDFS-13522: -- Summary: Support observer node from Router-Based Federation Key: HDFS-13522 URL: https://issues.apache.org/jira/browse/HDFS-13522 Project: Hadoop HDFS Issue Type: Sub-task Components: federation, namenode Reporter: Erik Krogen Changes will need to occur to the router to support the new observer node. One such change will be to make the router understand the observer state, e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461691#comment-16461691 ] Chao Sun commented on HDFS-13286: - [~xkrogen] Thanks for great comments! {quote}Take a look at BPServiceActor L905-909. I haven't looked at HDFS-9917 yet, but judging from the comment it seems we probably want to change this if-condition to state == HAServiceState.STANDBY || state == HAServiceState.OBSERVER. Can you see if you agree? {quote} Yes, I think this also applies to OBSERVER. Will fix. {quote}The constructor of StandbyState needs to be modified to super(isObserver ? HAServiceState.OBSERVER : HAServiceState.STANDBY); instead of just always passing STANDBY, else things which use HAState#getServiceState() will be wrong (see for example NameNode#getServiceStatus()). {quote} Good catch! Fixed. {quote}It looks like we also need to update the enums in FederationNamenodeServiceState and NNHAStatusHeartbeatProto, and their associated usages {quote} Yes. I'm going to just use {{FederationNamenodeServiceState.STANDBY}} for {{HAServiceState.OBSERVER}} for now. Supporting Observer in RBF will be done in a separate JIRA. For {{NNHAStatusHeartbeatProto}}, I found I need to add a {{OBSERVER}} state in {{NNHAStatusHeartbeatProto}}. {quote}We should be able to remove the TODO on NameNode L1866? {quote} Done. {quote}I think this section of FailoverController#preFailoverChecks() may need some work: ... It seems the first if-condition is assuming there are only two possible states, so if the state is not STANDBY, it must be ACTIVE. I think we should update this to explicitly check for ACTIVE. Next, is the service is in OBSERVER state, isReadyToBecomeActive() will be false. In this case, FailoverController#preFailoverChecks() will still allow this operation if forceActive is true. I don't think we want to allow forceActive to attempt to failover an observer, right? {quote} Hmm... For the first if-condition, exception will be thrown if the target service is **not** STANDBY. This should be good for OBSERVER case, right (we don't want failover target to be OBSERVER)? if this is the case, then the following checks only apply to STANDBY, so nothing need to change. {quote}For all three usages of FSNameystem#isInStandbyState(), it actually seems to me that they should apply if it is in observer or standby state, can you double check and if so update accordingly? {quote} Yes you are right. Will change that to include OBSERVER too. Thanks again! It seems I missed a lot of places where {{HAServiceState.STANDBY}} are used... :-/ Will double check that and submit another patch. > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286-HDFS-12943.000.patch, > HDFS-13286-HDFS-12943.001.patch, HDFS-13286-HDFS-12943.002.patch, > HDFS-13286-HDFS-12943.003.patch, HDFS-13286-HDFS-12943.004.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13429) libhdfs++ Expose a C++ logging API
[ https://issues.apache.org/jira/browse/HDFS-13429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461684#comment-16461684 ] genericqa commented on HDFS-13429: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 7m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 59m 24s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 15m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 8m 1s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 30s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 56s{color} | {color:green} hadoop-hdfs-native-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}130m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HDFS-13429 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12921613/HDFS-13429.000.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 1282120f8be0 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6b63a0a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/24127/artifact/out/whitespace-eol.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/24127/testReport/ | | Max. process+thread count | 340 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: hadoop-hdfs-project/hadoop-hdfs-native-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/24127/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > libhdfs++ Expose a C++ logging API > -- > > Key: HDFS-13429 > URL: https://issues.apache.org/jira/browse/HDFS-13429 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer >Priority: Critical > Attachments: HDFS-13429.000.patch > > > The libhdfs++ C API supports taking function pointers for log plugins and > defines levels for components and severity.
[jira] [Resolved] (HDFS-6589) TestDistributedFileSystem.testAllWithNoXmlDefaults failed intermittently
[ https://issues.apache.org/jira/browse/HDFS-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HDFS-6589. --- Resolution: Cannot Reproduce Resolve as cannot reproduce. The last time I see this bug was 2 years ago. Most likely it was a real bug and fixed later. > TestDistributedFileSystem.testAllWithNoXmlDefaults failed intermittently > > > Key: HDFS-6589 > URL: https://issues.apache.org/jira/browse/HDFS-6589 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.5.0 >Reporter: Yongjun Zhang >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: flaky-test > > https://builds.apache.org/job/PreCommit-HDFS-Build/7207 is clean > https://builds.apache.org/job/PreCommit-HDFS-Build/7208 has the following > failure. The code is essentially the same. > Running the same test locally doesn't reproduce. A flaky test there. > {code} > Stacktrace > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSClient(TestDistributedFileSystem.java:263) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testAllWithNoXmlDefaults(TestDistributedFileSystem.java:651) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13521) NFS Gateway should support impersonation
Wei-Chiu Chuang created HDFS-13521: -- Summary: NFS Gateway should support impersonation Key: HDFS-13521 URL: https://issues.apache.org/jira/browse/HDFS-13521 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang Similar to HDFS-10481, NFS gateway and httpfs are independent processes that accept client connections. NFS Gateway currently solves file permission/ownership problem by running as HDFS super user, and then call setOwner() to change file owner. This is not desirable. # it adds additional RPC load to NameNode. # this does not support at-rest encryption, because by design, HDFS super user cannot access KMS. This is yet another problem around KMS ACL. [~xiaochen] [~rushabh.shah] thoughts? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-6) Enable SCM kerberos auth
[ https://issues.apache.org/jira/browse/HDDS-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461581#comment-16461581 ] genericqa commented on HDDS-6: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 52s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-ozone/integration-test {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 6s{color} | {color:red} hadoop-hdds/common in trunk has 1 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 48s{color} | {color:red} hadoop-hdds/server-scm in trunk has 1 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 58s{color} | {color:red} hadoop-ozone/common in trunk has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 29s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} shellcheck {color} | {color:red} 0m 0s{color} | {color:red} The patch generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 36s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-ozone/integration-test {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 14s{color} | {color:green} common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 38s{color} | {color:green}
[jira] [Commented] (HDDS-15) Add memory profiler support to Genesis
[ https://issues.apache.org/jira/browse/HDDS-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461579#comment-16461579 ] Hudson commented on HDDS-15: SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14114 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14114/]) HDDS-15. Add memory profiler support to Genesis. Contributed by Anu (aengineer: rev 6b63a0af9b29c231166d9af50d499a246cbbb755) * (edit) hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/genesis/Genesis.java * (add) hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/genesis/GenesisMemoryProfiler.java > Add memory profiler support to Genesis > -- > > Key: HDDS-15 > URL: https://issues.apache.org/jira/browse/HDDS-15 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Tools >Affects Versions: 0.2.1 >Reporter: Anu Engineer >Assignee: Anu Engineer >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-15.001.patch, HDDS-15.002.patch, HDDS-15.003.patch > > > Add the ability to sample max memory usage when running tests under Genesis. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13429) libhdfs++ Expose a C++ logging API
[ https://issues.apache.org/jira/browse/HDFS-13429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-13429: --- Status: Patch Available (was: Open) > libhdfs++ Expose a C++ logging API > -- > > Key: HDFS-13429 > URL: https://issues.apache.org/jira/browse/HDFS-13429 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer >Priority: Critical > Attachments: HDFS-13429.000.patch > > > The libhdfs++ C API supports taking function pointers for log plugins and > defines levels for components and severity. It'd be nice to have an > idiomatic C++ logging interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13399) Make Client field AlignmentContext non-static.
[ https://issues.apache.org/jira/browse/HDFS-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461569#comment-16461569 ] Plamen Jeliazkov commented on HDFS-13399: - Hey Konstantin, Alright I think things are cleared up now for the most part about the overall plan. Just a couple questions / nits: Regarding (1), I see also you left {{DFSClient.getAlignmentContext}} in. Should we just remove it? I don't believe there is a path from the {{ProxyProvider}} back to the {{DFSClient}}. Also please take a look at the comment I made here (just above): https://issues.apache.org/jira/browse/HDFS-13399?focusedCommentId=16454623=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16454623 We can discuss this in detail next meeting but there is a serious concern regarding the {{FSEditLogAsync}} and sending the client state over the RPC. My suspicion is we need to make additional changes to {{FSEditLogAsync...}} > Make Client field AlignmentContext non-static. > -- > > Key: HDFS-13399 > URL: https://issues.apache.org/jira/browse/HDFS-13399 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-12943 >Reporter: Plamen Jeliazkov >Assignee: Plamen Jeliazkov >Priority: Major > Attachments: HDFS-13399-HDFS-12943.000.patch, > HDFS-13399-HDFS-12943.001.patch, HDFS-13399-HDFS-12943.002.patch, > HDFS-13399-HDFS-12943.003.patch, HDFS-13399-HDFS-12943.004.patch, > HDFS-13399-HDFS-12943.005.patch, HDFS-13399-HDFS-12943.006.patch > > > In HDFS-12977, DFSClient's constructor was altered to make use of a new > static method in Client that allowed one to set an AlignmentContext. This > work is to remove that static field and make each DFSClient pass it's > AlignmentContext down to the proxy Call level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-15) Add memory profiler support to Genesis
[ https://issues.apache.org/jira/browse/HDDS-15?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDDS-15: - Resolution: Fixed Status: Resolved (was: Patch Available) [~xyao] Thanks for the reviews. I have committed this to trunk. > Add memory profiler support to Genesis > -- > > Key: HDDS-15 > URL: https://issues.apache.org/jira/browse/HDDS-15 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Tools >Affects Versions: 0.2.1 >Reporter: Anu Engineer >Assignee: Anu Engineer >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-15.001.patch, HDDS-15.002.patch, HDDS-15.003.patch > > > Add the ability to sample max memory usage when running tests under Genesis. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13520) fuse_dfs to support keytab based login
[ https://issues.apache.org/jira/browse/HDFS-13520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-13520: --- Description: It looks like the current fuse_dfs implementation supports login using current kerberos credential. If the tgt expires, it fails with the following error: {noformat} hdfsBuilderConnect(forceNewInstance=1, nn=hdfs://ns1, port=0, kerbTicketCachePath=/tmp/krb5cc_2000, userName=systest) error: LoginException: Unable to obtain Principal Name for authentication org.apache.hadoop.security.KerberosAuthException: failure to login: for user: systest using ticket cache file: /tmp/krb5cc_2000 javax.security.auth.login.LoginException: Unable to obtain Principal Name for authentication at org.apache.hadoop.security.UserGroupInformation.getUGIFromTicketCache(UserGroupInformation.java:807) at org.apache.hadoop.security.UserGroupInformation.getBestUGI(UserGroupInformation.java:742) at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:404) Caused by: javax.security.auth.login.LoginException: Unable to obtain Principal Name for authentication at com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:841) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:704) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at org.apache.hadoop.security.UserGroupInformation.getUGIFromTicketCache(UserGroupInformation.java:788) ... 2 more {noformat} This is reproducible easily in a test cluster with an extremely short ticket life time (e.g. 1 minute) Note: HDFS-3608 addresses a similar issue, but in this case, since the ticket cache file itself does not change, fuse couldn't detect & update. It looks like it should call UserGroupInformation#loginFromKeytab() in the beginning, similar to how balancer supports keytab based login (HDFS-9804). Thanks [~xiaochen] for the idea. A quick workaround would have a crontab job that periodically renew the kerberos ticket with a keytab, say every 8 hours. was: It looks like the current fuse_dfs implementation supports login using current kerberos credential. If the tgt expires, it fails with the following error: {noformat} hdfsBuilderConnect(forceNewInstance=1, nn=hdfs://ns1, port=0, kerbTicketCachePath=/tmp/krb5cc_2000, userName=systest) error: LoginException: Unable to obtain Principal Name for authentication org.apache.hadoop.security.KerberosAuthException: failure to login: for user: systest using ticket cache file: /tmp/krb5cc_2000 javax.security.auth.login.LoginException: Unable to obtain Principal Name for authentication at org.apache.hadoop.security.UserGroupInformation.getUGIFromTicketCache(UserGroupInformation.java:807) at org.apache.hadoop.security.UserGroupInformation.getBestUGI(UserGroupInformation.java:742) at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:404) Caused by: javax.security.auth.login.LoginException: Unable to obtain Principal Name for authentication at com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:841) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:704) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at
[jira] [Created] (HDFS-13520) fuse_dfs to support keytab based login
Wei-Chiu Chuang created HDFS-13520: -- Summary: fuse_dfs to support keytab based login Key: HDFS-13520 URL: https://issues.apache.org/jira/browse/HDFS-13520 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Environment: Hadoop 2.6/3.0, Kerberized, fuse_dfs Reporter: Wei-Chiu Chuang It looks like the current fuse_dfs implementation supports login using current kerberos credential. If the tgt expires, it fails with the following error: {noformat} hdfsBuilderConnect(forceNewInstance=1, nn=hdfs://ns1, port=0, kerbTicketCachePath=/tmp/krb5cc_2000, userName=systest) error: LoginException: Unable to obtain Principal Name for authentication org.apache.hadoop.security.KerberosAuthException: failure to login: for user: systest using ticket cache file: /tmp/krb5cc_2000 javax.security.auth.login.LoginException: Unable to obtain Principal Name for authentication at org.apache.hadoop.security.UserGroupInformation.getUGIFromTicketCache(UserGroupInformation.java:807) at org.apache.hadoop.security.UserGroupInformation.getBestUGI(UserGroupInformation.java:742) at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:404) Caused by: javax.security.auth.login.LoginException: Unable to obtain Principal Name for authentication at com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:841) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:704) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at org.apache.hadoop.security.UserGroupInformation.getUGIFromTicketCache(UserGroupInformation.java:788) ... 2 more {noformat} This is reproducible easily in a test cluster with an extremely short ticket life time (e.g. 1 minute) Note: HDFS-3608 addresses a similar issue, but in this case, since the ticket cache file itself does not change, fuse couldn't detect & update. It looks like it should call UserGroupInformation#loginFromKeytab() in the beginning, similar to how balancer supports keytab based login (HDFS-9804). Thanks [~xiaochen] for the idea. Or alternatively, have a background process that continuously relogin from keytab. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13481) TestRollingFileSystemSinkWithHdfs#testFlushThread: test failed intermittently
[ https://issues.apache.org/jira/browse/HDFS-13481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461418#comment-16461418 ] Daniel Templeton commented on HDFS-13481: - LGTM +1. If there are no other comments by end of day, I'll commit it. > TestRollingFileSystemSinkWithHdfs#testFlushThread: test failed intermittently > - > > Key: HDFS-13481 > URL: https://issues.apache.org/jira/browse/HDFS-13481 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Major > Attachments: HDFS-13481.001.patch > > > The test fails very rarely on a laptop, but very commonly during Jenkins runs. > {noformat} > Error Message > Flush thread did not run within 10 seconds > Stacktrace > java.lang.AssertionError: Flush thread did not run within 10 seconds > at > org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs.testFlushThread(TestRollingFileSystemSinkWithHdfs.java:291) > {noformat} > According to my test, this breaks about 0.3% locally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13429) libhdfs++ Expose a C++ logging API
[ https://issues.apache.org/jira/browse/HDFS-13429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461399#comment-16461399 ] James Clampffer commented on HDFS-13429: First patch ready for review. Comments: -The logger was already done in C++ but the interface didn't make it into the include/hdfspp directory. Most of this patch is just making the existing code consistent so it could be part of the public API. -C logging API that was in hdfspp/log.h was moved into hdfspp/hdfs_ext.h with the rest of the C API. This was done to avoid confusion over which headers were pure C and which used C++. The C++ logging API now lives in hdfspp/log.h. -Added a new test to make sure constants exposed in the public API aren't accidentally changed. > libhdfs++ Expose a C++ logging API > -- > > Key: HDFS-13429 > URL: https://issues.apache.org/jira/browse/HDFS-13429 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer >Priority: Critical > Attachments: HDFS-13429.000.patch > > > The libhdfs++ C API supports taking function pointers for log plugins and > defines levels for components and severity. It'd be nice to have an > idiomatic C++ logging interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13429) libhdfs++ Expose a C++ logging API
[ https://issues.apache.org/jira/browse/HDFS-13429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-13429: --- Attachment: HDFS-13429.000.patch > libhdfs++ Expose a C++ logging API > -- > > Key: HDFS-13429 > URL: https://issues.apache.org/jira/browse/HDFS-13429 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer >Priority: Critical > Attachments: HDFS-13429.000.patch > > > The libhdfs++ C API supports taking function pointers for log plugins and > defines levels for components and severity. It'd be nice to have an > idiomatic C++ logging interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13443) RBF: Update mount table cache immediately after changing (add/update/remove) mount table entries.
[ https://issues.apache.org/jira/browse/HDFS-13443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461357#comment-16461357 ] Íñigo Goiri commented on HDFS-13443: Other than the check style issues this looks good. The unit tests seem unrelated. A couple minor nits: * You can do {{TimeUnit.MINUTES.toMillis(5)}}, same for the 1 minute one too; then you can just set the hdfs-default value to be 1m and 5m. * {{MountTableRefreshThread}} has a logigng that uses + and could use the replacement {}. * For succesCount increase you can just use ++. Once those are fixed, +1. I'd like somebody else to review too to make sure I'm not missing anything. > RBF: Update mount table cache immediately after changing (add/update/remove) > mount table entries. > - > > Key: HDFS-13443 > URL: https://issues.apache.org/jira/browse/HDFS-13443 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > Labels: RBF > Attachments: HDFS-13443-branch-2.001.patch, > HDFS-13443-branch-2.002.patch, HDFS-13443.001.patch, HDFS-13443.002.patch, > HDFS-13443.003.patch, HDFS-13443.004.patch, HDFS-13443.005.patch, > HDFS-13443.006.patch > > > Currently mount table cache is updated periodically, by default cache is > updated every minute. After change in mount table, user operations may still > use old mount table. This is bit wrong. > To update mount table cache, maybe we can do following > * *Add refresh API in MountTableManager which will update mount table cache.* > * *When there is a change in mount table entries, router admin server can > update its cache and ask other routers to update their cache*. For example if > there are three routers R1,R2,R3 in a cluster then add mount table entry API, > at admin server side, will perform following sequence of action > ## user submit add mount table entry request on R1 > ## R1 adds the mount table entry in state store > ## R1 call refresh API on R2 > ## R1 calls refresh API on R3 > ## R1 directly freshest its cache > ## Add mount table entry response send back to user. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-10) docker changes to test secure ozone cluster
[ https://issues.apache.org/jira/browse/HDDS-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDDS-10: --- Summary: docker changes to test secure ozone cluster (was: Create SecureMiniOzoneCluster to facilitate security related test cases in ozone) > docker changes to test secure ozone cluster > --- > > Key: HDDS-10 > URL: https://issues.apache.org/jira/browse/HDDS-10 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Security >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Fix For: 0.3.0 > > > Create SecureMiniOzoneCluster to facilitate security related test cases in > ozone -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-10) docker changes to test secure ozone cluster
[ https://issues.apache.org/jira/browse/HDDS-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDDS-10: --- Description: Update docker compose and settings to test secure ozone cluster. (was: Create SecureMiniOzoneCluster to facilitate security related test cases in ozone) > docker changes to test secure ozone cluster > --- > > Key: HDDS-10 > URL: https://issues.apache.org/jira/browse/HDDS-10 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Security >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Fix For: 0.3.0 > > > Update docker compose and settings to test secure ozone cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461275#comment-16461275 ] Erik Krogen edited comment on HDFS-13286 at 5/2/18 5:28 PM: Hey [~csun], sorry for taking a while to get back to you on this. Your new {{HAServiceState}} changes inspired me to do a more comprehensive look at how those states are used and I noticed a few things: * Take a look at {{BPServiceActor}} L905-909. I haven't looked at HDFS-9917 yet, but judging from the comment it seems we probably want to change this if-condition to {{state == HAServiceState.STANDBY || state == HAServiceState.OBSERVER}}. Can you see if you agree? * The constructor of {{StandbyState}} needs to be modified to {{super(isObserver ? HAServiceState.OBSERVER : HAServiceState.STANDBY);}} instead of just always passing {{STANDBY}}, else things which use {{HAState#getServiceState()}} will be wrong (see for example {{NameNode#getServiceStatus()}}). * It looks like we also need to update the enums in {{FederationNamenodeServiceState}} and {{NNHAStatusHeartbeatProto}}, and their associated usages * We should be able to remove the TODO on {{NameNode}} L1866? * I think this section of {{FailoverController#preFailoverChecks()}} may need some work: {code} if (!toSvcStatus.getState().equals(HAServiceState.STANDBY)) { throw new FailoverFailedException( "Can't failover to an active service"); } if (!toSvcStatus.isReadyToBecomeActive()) { String notReadyReason = toSvcStatus.getNotReadyReason(); if (!forceActive) { throw new FailoverFailedException( target + " is not ready to become active: " + notReadyReason); } else { LOG.warn("Service is not ready to become active, but forcing: {}", notReadyReason); } } {code} It seems the first if-condition is assuming there are only two possible states, so if the state is not STANDBY, it must be ACTIVE. I think we should update this to explicitly check for ACTIVE. Next, is the service is in OBSERVER state, {{isReadyToBecomeActive()}} will be false. In this case, {{FailoverController#preFailoverChecks()}} will still allow this operation if {{forceActive}} is true. I don't think we want to allow {{forceActive}} to attempt to failover an observer, right? * For all three usages of {{FSNameystem#isInStandbyState()}}, it actually seems to me that they should apply if it is in observer or standby state, can you double check and if so update accordingly? was (Author: xkrogen): Hey [~csun], sorry for taking a while to get back to you on this. Your new {{HAServiceState}} changes inspired me to do a more comprehensive look at how those states are used and I noticed a few things: * Take a look at {{BPServiceActor}} L905-909. I haven't looked at HDFS-9917 yet, but judging from the comment it seems we probably want to change this if-condition to {{state == HAServiceState.STANDBY || state == HAServiceState.OBSERVER}}. Can you see if you agree? * The constructor of {{StandbyState}} needs to be modified to {{super(isObserver ? HAServiceState.OBSERVER : HAServiceState.STANDBY);}} instead of just always passing {{STANDBY}}, else things which use {{HAState#getServiceState()}} will be wrong (see for example {{NameNode#getServiceStatus()}}). * It looks like we also need to update the enums in {{FederationNamenodeServiceState}} and {{NNHAStatusHeartbeatProto}}, and their associated usages * We should be able to remove the TODO on {{NameNode}} L1866? > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286-HDFS-12943.000.patch, > HDFS-13286-HDFS-12943.001.patch, HDFS-13286-HDFS-12943.002.patch, > HDFS-13286-HDFS-12943.003.patch, HDFS-13286-HDFS-12943.004.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13245) RBF: State store DBMS implementation
[ https://issues.apache.org/jira/browse/HDFS-13245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461334#comment-16461334 ] Íñigo Goiri commented on HDFS-13245: Let's add TestStateStoreDisabledNameserviceStore in a separate JIRA. In general, I want to make sure that if somebody adds a new store, unit tests will fail if they don't add it to the SQL impl. > RBF: State store DBMS implementation > > > Key: HDFS-13245 > URL: https://issues.apache.org/jira/browse/HDFS-13245 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: maobaolong >Assignee: Yiran Wu >Priority: Major > Attachments: HDFS-13245.001.patch, HDFS-13245.002.patch, > HDFS-13245.003.patch, HDFS-13245.004.patch, HDFS-13245.005.patch, > HDFS-13245.006.patch > > > Add a DBMS implementation for the State Store. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-6) Enable SCM kerberos auth
[ https://issues.apache.org/jira/browse/HDDS-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDDS-6: -- Attachment: HDDS-4-HDDS-6.00.patch > Enable SCM kerberos auth > > > Key: HDDS-6 > URL: https://issues.apache.org/jira/browse/HDDS-6 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM, Security >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Fix For: 0.3.0 > > Attachments: HDDS-4-HDDS-6.00.patch > > > Enable SCM kerberos auth -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-6) Enable SCM kerberos auth
[ https://issues.apache.org/jira/browse/HDDS-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDDS-6: -- Attachment: (was: HDDS-6.00.patch) > Enable SCM kerberos auth > > > Key: HDDS-6 > URL: https://issues.apache.org/jira/browse/HDDS-6 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM, Security >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Fix For: 0.3.0 > > > Enable SCM kerberos auth -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13507) RBF: Remove update functionality from routeradmin's add cmd
[ https://issues.apache.org/jira/browse/HDFS-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461317#comment-16461317 ] Íñigo Goiri commented on HDFS-13507: I would add the command usage of HDFSRouterFederation.md in a different JIRA and limit this one to the change that breaks compatibility. > RBF: Remove update functionality from routeradmin's add cmd > --- > > Key: HDFS-13507 > URL: https://issues.apache.org/jira/browse/HDFS-13507 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Wei Yan >Assignee: Gang Li >Priority: Minor > Labels: incompatible > Attachments: HDFS-13507.000.patch, HDFS-13507.001.patch, > HDFS-13507.002.patch > > > Follow up the discussion in HDFS-13326. We should remove the "update" > functionality from routeradmin's add cmd, to make it consistent with RPC > calls. > Note that: this is an incompatible change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-15) Add memory profiler support to Genesis
[ https://issues.apache.org/jira/browse/HDDS-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461310#comment-16461310 ] Anu Engineer commented on HDDS-15: -- [~xyao] Thanks for the reviews. I had added the Lic and got a clean Jenkins build. I will commit this patch now. > Add memory profiler support to Genesis > -- > > Key: HDDS-15 > URL: https://issues.apache.org/jira/browse/HDDS-15 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Tools >Affects Versions: 0.2.1 >Reporter: Anu Engineer >Assignee: Anu Engineer >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-15.001.patch, HDDS-15.002.patch, HDDS-15.003.patch > > > Add the ability to sample max memory usage when running tests under Genesis. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13488) RBF: Reject requests when a Router is overloaded
[ https://issues.apache.org/jira/browse/HDFS-13488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461297#comment-16461297 ] Íñigo Goiri commented on HDFS-13488: Thanks [~linyiqun] for committing. > RBF: Reject requests when a Router is overloaded > > > Key: HDFS-13488 > URL: https://issues.apache.org/jira/browse/HDFS-13488 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.1 >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.4 > > Attachments: HDFS-13488.000.patch, HDFS-13488.001.patch, > HDFS-13488.002.patch, HDFS-13488.003.patch, HDFS-13488.004.patch > > > A Router might be overloaded when handling special cases (e.g. a slow > subcluster). The Router could reject the requests and the client could try > with another Router. We should leverage the Standby mechanism for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461275#comment-16461275 ] Erik Krogen commented on HDFS-13286: Hey [~csun], sorry for taking a while to get back to you on this. Your new {{HAServiceState}} changes inspired me to do a more comprehensive look at how those states are used and I noticed a few things: * Take a look at {{BPServiceActor}} L905-909. I haven't looked at HDFS-9917 yet, but judging from the comment it seems we probably want to change this if-condition to {{state == HAServiceState.STANDBY || state == HAServiceState.OBSERVER}}. Can you see if you agree? * The constructor of {{StandbyState}} needs to be modified to {{super(isObserver ? HAServiceState.OBSERVER : HAServiceState.STANDBY);}} instead of just always passing {{STANDBY}}, else things which use {{HAState#getServiceState()}} will be wrong (see for example {{NameNode#getServiceStatus()}}). * It looks like we also need to update the enums in {{FederationNamenodeServiceState}} and {{NNHAStatusHeartbeatProto}}, and their associated usages * We should be able to remove the TODO on {{NameNode}} L1866? > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286-HDFS-12943.000.patch, > HDFS-13286-HDFS-12943.001.patch, HDFS-13286-HDFS-12943.002.patch, > HDFS-13286-HDFS-12943.003.patch, HDFS-13286-HDFS-12943.004.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11807) libhdfs++: Get minidfscluster tests running under valgrind
[ https://issues.apache.org/jira/browse/HDFS-11807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461243#comment-16461243 ] Hudson commented on HDFS-11807: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14110 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14110/]) HDFS-11807. libhdfs++: Get minidfscluster tests running under valgrind. (jhc: rev 19ae588fde9930c042cdb2848b8a1a0ff514b575) * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/CMakeLists.txt * (add) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests/memcheck.supp * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests/CMakeLists.txt * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_mini_stress.c > libhdfs++: Get minidfscluster tests running under valgrind > -- > > Key: HDFS-11807 > URL: https://issues.apache.org/jira/browse/HDFS-11807 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: Anatoli Shein >Priority: Major > Attachments: HDFS-11807.HDFS-8707.000.patch, > HDFS-11807.HDFS-8707.001.patch, HDFS-11807.HDFS-8707.002.patch, > HDFS-11807.HDFS-8707.003.patch, HDFS-11807.HDFS-8707.004.patch, > HDFS-11807.HDFS-8707.005.patch, HDFS-11807.HDFS-8707.006.patch, > HDFS-11807.HDFS-8707.007.patch, HDFS-11807.HDFS-8707.008.patch, > HDFS-11807.HDFS-8707.009.patch > > > The gmock based unit tests generally don't expose race conditions and memory > stomps. A good way to expose these is running libhdfs++ stress tests and > tools under valgrind and pointing them at a real cluster. Right now the CI > tools don't do that so bugs occasionally slip in and aren't caught until they > cause trouble in applications that use libhdfs++ for HDFS access. > The reason the minidfscluster tests don't run under valgrind is because the > GC and JIT compiler in the embedded JVM do things that look like errors to > valgrind. I'd like to have these tests do some basic setup and then fork > into two processes: one for the minidfscluster stuff and one for the > libhdfs++ client test. A small amount of shared memory can be used to > provide a place for the minidfscluster to stick the hdfsBuilder object that > the client needs to get info about which port to connect to. Can also stick > a condition variable there to let the minidfscluster know when it can shut > down. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11807) libhdfs++: Get minidfscluster tests running under valgrind
[ https://issues.apache.org/jira/browse/HDFS-11807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-11807: --- Resolution: Fixed Status: Resolved (was: Patch Available) > libhdfs++: Get minidfscluster tests running under valgrind > -- > > Key: HDFS-11807 > URL: https://issues.apache.org/jira/browse/HDFS-11807 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: Anatoli Shein >Priority: Major > Attachments: HDFS-11807.HDFS-8707.000.patch, > HDFS-11807.HDFS-8707.001.patch, HDFS-11807.HDFS-8707.002.patch, > HDFS-11807.HDFS-8707.003.patch, HDFS-11807.HDFS-8707.004.patch, > HDFS-11807.HDFS-8707.005.patch, HDFS-11807.HDFS-8707.006.patch, > HDFS-11807.HDFS-8707.007.patch, HDFS-11807.HDFS-8707.008.patch, > HDFS-11807.HDFS-8707.009.patch > > > The gmock based unit tests generally don't expose race conditions and memory > stomps. A good way to expose these is running libhdfs++ stress tests and > tools under valgrind and pointing them at a real cluster. Right now the CI > tools don't do that so bugs occasionally slip in and aren't caught until they > cause trouble in applications that use libhdfs++ for HDFS access. > The reason the minidfscluster tests don't run under valgrind is because the > GC and JIT compiler in the embedded JVM do things that look like errors to > valgrind. I'd like to have these tests do some basic setup and then fork > into two processes: one for the minidfscluster stuff and one for the > libhdfs++ client test. A small amount of shared memory can be used to > provide a place for the minidfscluster to stick the hdfsBuilder object that > the client needs to get info about which port to connect to. Can also stick > a condition variable there to let the minidfscluster know when it can shut > down. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11807) libhdfs++: Get minidfscluster tests running under valgrind
[ https://issues.apache.org/jira/browse/HDFS-11807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461217#comment-16461217 ] James Clampffer commented on HDFS-11807: +1, I just committed this to trunk. Sorry it took so long to get around to it. Some tests I did: -Build with gcc and clang in the dev-support docker container and ran tests -Add memory leaks to make sure it caught them -Add invalid reads and writes to make sure they were caught > libhdfs++: Get minidfscluster tests running under valgrind > -- > > Key: HDFS-11807 > URL: https://issues.apache.org/jira/browse/HDFS-11807 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: Anatoli Shein >Priority: Major > Attachments: HDFS-11807.HDFS-8707.000.patch, > HDFS-11807.HDFS-8707.001.patch, HDFS-11807.HDFS-8707.002.patch, > HDFS-11807.HDFS-8707.003.patch, HDFS-11807.HDFS-8707.004.patch, > HDFS-11807.HDFS-8707.005.patch, HDFS-11807.HDFS-8707.006.patch, > HDFS-11807.HDFS-8707.007.patch, HDFS-11807.HDFS-8707.008.patch, > HDFS-11807.HDFS-8707.009.patch > > > The gmock based unit tests generally don't expose race conditions and memory > stomps. A good way to expose these is running libhdfs++ stress tests and > tools under valgrind and pointing them at a real cluster. Right now the CI > tools don't do that so bugs occasionally slip in and aren't caught until they > cause trouble in applications that use libhdfs++ for HDFS access. > The reason the minidfscluster tests don't run under valgrind is because the > GC and JIT compiler in the embedded JVM do things that look like errors to > valgrind. I'd like to have these tests do some basic setup and then fork > into two processes: one for the minidfscluster stuff and one for the > libhdfs++ client test. A small amount of shared memory can be used to > provide a place for the minidfscluster to stick the hdfsBuilder object that > the client needs to get info about which port to connect to. Can also stick > a condition variable there to let the minidfscluster know when it can shut > down. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12981) HDFS renameSnapshot to Itself for Non Existent snapshot should throw error
[ https://issues.apache.org/jira/browse/HDFS-12981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kitti Nanasi updated HDFS-12981: Attachment: HDFS-12981-branch-2.6.0.001.patch > HDFS renameSnapshot to Itself for Non Existent snapshot should throw error > --- > > Key: HDFS-12981 > URL: https://issues.apache.org/jira/browse/HDFS-12981 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 2.6.0 >Reporter: Sailesh Patel >Assignee: Kitti Nanasi >Priority: Minor > Attachments: HDFS-12981-branch-2.6.0.001.patch, HDFS-12981.001.patch, > HDFS-12981.002.patch > > > When trying to rename a non-existent HDFS snapshot to ITSELF, there are no > errors and exits with a success code. > The steps to reproduce this issue is: > hdfs dfs -mkdir /tmp/dir1 > hdfs dfsadmin -allowSnapshot /tmp/dir1 > hdfs dfs -createSnapshot /tmp/dir1 snap1_dir > Rename from non-existent to another_non-existent : errors and return code 1. > This is correct. > hdfs dfs -renameSnapshot /tmp/dir1 nonexist another_nonexist : > echo $? > > renameSnapshot: The snapshot nonexist does not exist for directory /tmp/dir1 > Rename from non-existent to non-existent : no errors and return code 0 > instead of Error and return code 1. > hdfs dfs -renameSnapshot /tmp/dir1 nonexist nonexist ; echo $? > Current behavior: No error and return code 0. > Expected behavior: An error returned and return code 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13398) Hdfs recursive listing operation is very slow
[ https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Sachdev updated HDFS-13398: Attachment: HDFS-13398.001.patch > Hdfs recursive listing operation is very slow > - > > Key: HDFS-13398 > URL: https://issues.apache.org/jira/browse/HDFS-13398 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.1 > Environment: HCFS file system where HDP 2.6.1 is connected to ECS > (Object Store). >Reporter: Ajay Sachdev >Assignee: Ajay Sachdev >Priority: Major > Fix For: 2.7.1 > > Attachments: HDFS-13398.001.patch, parallelfsPatch > > > The hdfs dfs -ls -R command is sequential in nature and is very slow for a > HCFS system. We have seen around 6 mins for 40K directory/files structure. > The proposal is to use multithreading approach to speed up recursive list, du > and count operations. > We have tried a ForkJoinPool implementation to improve performance for > recursive listing operation. > [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli] > commit id : > 82387c8cd76c2e2761bd7f651122f83d45ae8876 > Another implementation is to use Java Executor Service to improve performance > to run listing operation in multiple threads in parallel. This has > significantly reduced the time to 40 secs from 6 mins. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow
[ https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461122#comment-16461122 ] Ajay Sachdev commented on HDFS-13398: - Hi Mukul/Arpit, Sorry for late response. We would like to get a formal review process for this patch before we deploy on customer site. I will work on items #3 and #4 above. Also I was unable to find trunk branch in [https://github.com/hortonworks/hadoop-release.] So we have used HDP tag - HDP-2.6.2.0-205-tag I have also attached the patch against this tag. I would appreciate if you could take a look at code and get a review as well. Thanks Ajay[^HDFS-13398.001.patch] > Hdfs recursive listing operation is very slow > - > > Key: HDFS-13398 > URL: https://issues.apache.org/jira/browse/HDFS-13398 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.1 > Environment: HCFS file system where HDP 2.6.1 is connected to ECS > (Object Store). >Reporter: Ajay Sachdev >Assignee: Ajay Sachdev >Priority: Major > Fix For: 2.7.1 > > Attachments: HDFS-13398.001.patch, parallelfsPatch > > > The hdfs dfs -ls -R command is sequential in nature and is very slow for a > HCFS system. We have seen around 6 mins for 40K directory/files structure. > The proposal is to use multithreading approach to speed up recursive list, du > and count operations. > We have tried a ForkJoinPool implementation to improve performance for > recursive listing operation. > [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli] > commit id : > 82387c8cd76c2e2761bd7f651122f83d45ae8876 > Another implementation is to use Java Executor Service to improve performance > to run listing operation in multiple threads in parallel. This has > significantly reduced the time to 40 secs from 6 mins. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12981) HDFS renameSnapshot to Itself for Non Existent snapshot should throw error
[ https://issues.apache.org/jira/browse/HDFS-12981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kitti Nanasi updated HDFS-12981: Attachment: HDFS-12981.002.patch > HDFS renameSnapshot to Itself for Non Existent snapshot should throw error > --- > > Key: HDFS-12981 > URL: https://issues.apache.org/jira/browse/HDFS-12981 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 2.6.0 >Reporter: Sailesh Patel >Assignee: Kitti Nanasi >Priority: Minor > Attachments: HDFS-12981.001.patch, HDFS-12981.002.patch > > > When trying to rename a non-existent HDFS snapshot to ITSELF, there are no > errors and exits with a success code. > The steps to reproduce this issue is: > hdfs dfs -mkdir /tmp/dir1 > hdfs dfsadmin -allowSnapshot /tmp/dir1 > hdfs dfs -createSnapshot /tmp/dir1 snap1_dir > Rename from non-existent to another_non-existent : errors and return code 1. > This is correct. > hdfs dfs -renameSnapshot /tmp/dir1 nonexist another_nonexist : > echo $? > > renameSnapshot: The snapshot nonexist does not exist for directory /tmp/dir1 > Rename from non-existent to non-existent : no errors and return code 0 > instead of Error and return code 1. > hdfs dfs -renameSnapshot /tmp/dir1 nonexist nonexist ; echo $? > Current behavior: No error and return code 0. > Expected behavior: An error returned and return code 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-16) Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition.
[ https://issues.apache.org/jira/browse/HDDS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460971#comment-16460971 ] genericqa commented on HDDS-16: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 55s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 10s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 10s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 11s{color} | {color:red} common in trunk failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 24s{color} | {color:red} container-service in trunk failed. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-ozone/integration-test {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 45s{color} | {color:red} hadoop-hdds/common in trunk has 1 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 32s{color} | {color:red} hadoop-ozone/tools in trunk has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 0s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 28m 0s{color} | {color:red} root generated 9 new + 2 unchanged - 0 fixed = 11 total (was 2) {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 28m 0s{color} | {color:red} root generated 1217 new + 260 unchanged - 0 fixed = 1477 total (was 260) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 50s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-ozone/integration-test {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 8s{color} | {color:green} common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s{color} | {color:green} container-service in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s{color} | {color:green} client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s{color} | {color:green}
[jira] [Commented] (HDFS-13443) RBF: Update mount table cache immediately after changing (add/update/remove) mount table entries.
[ https://issues.apache.org/jira/browse/HDFS-13443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460955#comment-16460955 ] genericqa commented on HDFS-13443: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 56s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 16m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 13s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 59s{color} | {color:orange} hadoop-hdfs-project: The patch generated 2 new + 1 unchanged - 0 fixed = 3 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}101m 6s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 19s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}215m 37s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.qjournal.server.TestJournalNodeSync | | | hadoop.hdfs.server.namenode.TestReencryptionWithKMS | | | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HDFS-13443 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12921557/HDFS-13443.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460906#comment-16460906 ] genericqa commented on HDFS-13174: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 48s{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs generated 2 new + 531 unchanged - 0 fixed = 533 total (was 531) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 51s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 752 unchanged - 1 fixed = 757 total (was 753) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 52s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}111m 31s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}167m 14s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.client.impl.TestBlockReaderLocal | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HDFS-13174 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12921555/HDFS-13174.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 98292cd14a30 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e07156e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | javac |
[jira] [Updated] (HDDS-16) Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition.
[ https://issues.apache.org/jira/browse/HDDS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-16: -- Attachment: HDDS-16.001.patch > Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition. > > > Key: HDDS-16 > URL: https://issues.apache.org/jira/browse/HDDS-16 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Native, Ozone Datanode >Affects Versions: 0.2.1 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-16.001.patch > > > The current Ozone code passes pipeline information to datanodes as well. > However datanodes do not use this information. > Hence Pipeline should be removed from ozone datanode commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-16) Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition.
[ https://issues.apache.org/jira/browse/HDDS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-16: -- Status: Patch Available (was: Open) > Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition. > > > Key: HDDS-16 > URL: https://issues.apache.org/jira/browse/HDDS-16 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Native, Ozone Datanode >Affects Versions: 0.2.1 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-16.001.patch > > > The current Ozone code passes pipeline information to datanodes as well. > However datanodes do not use this information. > Hence Pipeline should be removed from ozone datanode commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-16) Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition.
[ https://issues.apache.org/jira/browse/HDDS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-16: -- Attachment: (was: HDFS-12841-HDFS-7240.002.patch) > Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition. > > > Key: HDDS-16 > URL: https://issues.apache.org/jira/browse/HDDS-16 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Native, Ozone Datanode >Affects Versions: 0.2.1 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-16.001.patch > > > The current Ozone code passes pipeline information to datanodes as well. > However datanodes do not use this information. > Hence Pipeline should be removed from ozone datanode commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-16) Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition.
[ https://issues.apache.org/jira/browse/HDDS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-16: -- Attachment: (was: HDFS-12841-HDFS-7240.003.patch) > Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition. > > > Key: HDDS-16 > URL: https://issues.apache.org/jira/browse/HDDS-16 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Native, Ozone Datanode >Affects Versions: 0.2.1 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-16.001.patch > > > The current Ozone code passes pipeline information to datanodes as well. > However datanodes do not use this information. > Hence Pipeline should be removed from ozone datanode commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-16) Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition.
[ https://issues.apache.org/jira/browse/HDDS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-16: -- Attachment: (was: HDFS-12841-HDFS-7240.001.patch) > Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition. > > > Key: HDDS-16 > URL: https://issues.apache.org/jira/browse/HDDS-16 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Native, Ozone Datanode >Affects Versions: 0.2.1 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-16.001.patch > > > The current Ozone code passes pipeline information to datanodes as well. > However datanodes do not use this information. > Hence Pipeline should be removed from ozone datanode commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-16) Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition.
[ https://issues.apache.org/jira/browse/HDDS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-16: -- Attachment: (was: HDFS-12841-HDFS-7240.004.patch) > Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition. > > > Key: HDDS-16 > URL: https://issues.apache.org/jira/browse/HDDS-16 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Native, Ozone Datanode >Affects Versions: 0.2.1 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-16.001.patch > > > The current Ozone code passes pipeline information to datanodes as well. > However datanodes do not use this information. > Hence Pipeline should be removed from ozone datanode commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Moved] (HDDS-16) Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition.
[ https://issues.apache.org/jira/browse/HDDS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh moved HDFS-12841 to HDDS-16: -- Fix Version/s: (was: HDFS-7240) 0.2.1 Affects Version/s: (was: HDFS-7240) 0.2.1 Target Version/s: (was: HDFS-7240) Component/s: (was: ozone) Ozone Datanode Native Workflow: patch-available, re-open possible (was: no-reopen-closed, patch-avail) Key: HDDS-16 (was: HDFS-12841) Project: Hadoop Distributed Data Store (was: Hadoop HDFS) > Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition. > > > Key: HDDS-16 > URL: https://issues.apache.org/jira/browse/HDDS-16 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Native, Ozone Datanode >Affects Versions: 0.2.1 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.2.1 > > Attachments: HDFS-12841-HDFS-7240.001.patch, > HDFS-12841-HDFS-7240.002.patch, HDFS-12841-HDFS-7240.003.patch, > HDFS-12841-HDFS-7240.004.patch > > > The current Ozone code passes pipeline information to datanodes as well. > However datanodes do not use this information. > Hence Pipeline should be removed from ozone datanode commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12841) Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition.
[ https://issues.apache.org/jira/browse/HDFS-12841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDFS-12841: - Issue Type: Improvement (was: Sub-task) Parent: (was: HDFS-7240) > Ozone: Remove Pipeline from Datanode Container Protocol protobuf definition. > > > Key: HDFS-12841 > URL: https://issues.apache.org/jira/browse/HDFS-12841 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-12841-HDFS-7240.001.patch, > HDFS-12841-HDFS-7240.002.patch, HDFS-12841-HDFS-7240.003.patch, > HDFS-12841-HDFS-7240.004.patch > > > The current Ozone code passes pipeline information to datanodes as well. > However datanodes do not use this information. > Hence Pipeline should be removed from ozone datanode commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13245) RBF: State store DBMS implementation
[ https://issues.apache.org/jira/browse/HDFS-13245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460792#comment-16460792 ] Yiqun Lin commented on HDFS-13245: -- Some review comments from me: * I see {{TestStateStoreDisabledNameserviceStore}} is missing in current UT. We can add this in another JIRA or just add this here. * {{ENCODE_UTF8 = "UTF-8";}} can be replaced by {{StandardCharsets.UTF_8.name();}}. * {{testMetrics}} can be included in {{TestStateStore*SQLDB}}. > RBF: State store DBMS implementation > > > Key: HDFS-13245 > URL: https://issues.apache.org/jira/browse/HDFS-13245 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: maobaolong >Assignee: Yiran Wu >Priority: Major > Attachments: HDFS-13245.001.patch, HDFS-13245.002.patch, > HDFS-13245.003.patch, HDFS-13245.004.patch, HDFS-13245.005.patch, > HDFS-13245.006.patch > > > Add a DBMS implementation for the State Store. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-13369) FSCK Report broken with RequestHedgingProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ranith Sardar updated HDFS-13369: - Comment: was deleted (was: h4. Hi [~eddyxu] , [~andrew.wang] . Can you please review this patch? It is related to RequestHedgingProxyProvider and client.) > FSCK Report broken with RequestHedgingProxyProvider > > > Key: HDFS-13369 > URL: https://issues.apache.org/jira/browse/HDFS-13369 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.3 >Reporter: Harshakiran Reddy >Assignee: Ranith Sardar >Priority: Major > Attachments: HDFS-13369.001.patch, HDFS-13369.002.patch, > HDFS-13369.003.patch, HDFS-13369.004.patch > > > Scenario:- > 1.Configure the RequestHedgingProxy > 2. write some files in file system > 3. Take FSCK report for the above files > > {noformat} > bin> hdfs fsck /file1 -locations -files -blocks > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider$RequestHedgingInvocationHandler > cannot be cast to org.apache.hadoop.ipc.RpcInvocationHandler > at org.apache.hadoop.ipc.RPC.getConnectionIdForProxy(RPC.java:626) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.getConnectionId(RetryInvocationHandler.java:438) > at org.apache.hadoop.ipc.RPC.getConnectionIdForProxy(RPC.java:628) > at org.apache.hadoop.ipc.RPC.getServerAddress(RPC.java:611) > at org.apache.hadoop.hdfs.HAUtil.getAddressOfActive(HAUtil.java:263) > at > org.apache.hadoop.hdfs.tools.DFSck.getCurrentNamenodeAddress(DFSck.java:257) > at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:319) > at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:156) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:153) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:152) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:385){noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11934) Add assertion to TestDefaultNameNodePort#testGetAddressFromConf
[ https://issues.apache.org/jira/browse/HDFS-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460726#comment-16460726 ] Akira Ajisaka commented on HDFS-11934: -- Thanks [~legend-hua] for the patch. The change looks good to me. Would you reverse the order of the arguments of assertEquals(expected, actual) in the class? I'm +1 if that is addressed. > Add assertion to TestDefaultNameNodePort#testGetAddressFromConf > --- > > Key: HDFS-11934 > URL: https://issues.apache.org/jira/browse/HDFS-11934 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-alpha4 >Reporter: legend >Priority: Minor > Attachments: HDFS-11934.patch > > > Add an additional assertion to TestDefaultNameNodePort, verify that > testGetAddressFromConf returns 555 if setDefaultUri(conf, "foo:555"). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13443) RBF: Update mount table cache immediately after changing (add/update/remove) mount table entries.
[ https://issues.apache.org/jira/browse/HDFS-13443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Arshad updated HDFS-13443: --- Attachment: HDFS-13443.006.patch > RBF: Update mount table cache immediately after changing (add/update/remove) > mount table entries. > - > > Key: HDFS-13443 > URL: https://issues.apache.org/jira/browse/HDFS-13443 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > Labels: RBF > Attachments: HDFS-13443-branch-2.001.patch, > HDFS-13443-branch-2.002.patch, HDFS-13443.001.patch, HDFS-13443.002.patch, > HDFS-13443.003.patch, HDFS-13443.004.patch, HDFS-13443.005.patch, > HDFS-13443.006.patch > > > Currently mount table cache is updated periodically, by default cache is > updated every minute. After change in mount table, user operations may still > use old mount table. This is bit wrong. > To update mount table cache, maybe we can do following > * *Add refresh API in MountTableManager which will update mount table cache.* > * *When there is a change in mount table entries, router admin server can > update its cache and ask other routers to update their cache*. For example if > there are three routers R1,R2,R3 in a cluster then add mount table entry API, > at admin server side, will perform following sequence of action > ## user submit add mount table entry request on R1 > ## R1 adds the mount table entry in state store > ## R1 call refresh API on R2 > ## R1 calls refresh API on R3 > ## R1 directly freshest its cache > ## Add mount table entry response send back to user. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13443) RBF: Update mount table cache immediately after changing (add/update/remove) mount table entries.
[ https://issues.apache.org/jira/browse/HDFS-13443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460723#comment-16460723 ] Mohammad Arshad commented on HDFS-13443: Addressed all the comments, except above two comments, submitted new patch HDFS-13443.006.patch. bq. I would probably call it TestRemoteRouterMountTableRefresh or similar; no need for the ZK suffix either changed to TestRouterMountTableCacheRefresh bq. For stopping one Router, just pick one, no need to go through the list. Need to go through the list to pick the router which is not selected for other admin operations > RBF: Update mount table cache immediately after changing (add/update/remove) > mount table entries. > - > > Key: HDFS-13443 > URL: https://issues.apache.org/jira/browse/HDFS-13443 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > Labels: RBF > Attachments: HDFS-13443-branch-2.001.patch, > HDFS-13443-branch-2.002.patch, HDFS-13443.001.patch, HDFS-13443.002.patch, > HDFS-13443.003.patch, HDFS-13443.004.patch, HDFS-13443.005.patch > > > Currently mount table cache is updated periodically, by default cache is > updated every minute. After change in mount table, user operations may still > use old mount table. This is bit wrong. > To update mount table cache, maybe we can do following > * *Add refresh API in MountTableManager which will update mount table cache.* > * *When there is a change in mount table entries, router admin server can > update its cache and ask other routers to update their cache*. For example if > there are three routers R1,R2,R3 in a cluster then add mount table entry API, > at admin server side, will perform following sequence of action > ## user submit add mount table entry request on R1 > ## R1 adds the mount table entry in state store > ## R1 call refresh API on R2 > ## R1 calls refresh API on R3 > ## R1 directly freshest its cache > ## Add mount table entry response send back to user. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Fajth updated HDFS-13174: Release Note: Mover could have fail after 20+ minutes if a block move was enqueued for this long, between two DataNodes due to an internal constant that was introduced for Balancer, but affected Mover as well. The internal constant can be configured with the dfs.balancer.max-iteration-time parameter after the patch, and affects only the Balancer. Default is 20 minutes. Status: Patch Available (was: Open) > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 3.0.0-alpha2, 2.7.4, 2.8.0 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Attachments: HDFS-13174.001.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, and there are moves decided and enqueued > between two DataNode, after 20 minutes Mover will stop with the following > exception reported to the console (lines might differ as this exception came > from a CDH5.12.1 installation). > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Note that this issue is not coming up if all blocks can be moved inside the > DataNodes without having to move the block to an other DataNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Fajth updated HDFS-13174: Description: In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source class, that is checked during dispatching the moves that the Balancer and the Mover does. This timeout is hardwired to 20 minutes. In the Balancer we have iterations, and even if an iteration is timing out the Balancer runs further and does an other iteration before it fails if there were no moves happened in a few iterations. The Mover on the other hand does not have iterations, so if moving a path runs for more than 20 minutes, and there are moves decided and enqueued between two DataNode, after 20 minutes Mover will stop with the following exception reported to the console (lines might differ as this exception came from a CDH5.12.1 installation). java.io.IOException: Block move timed out at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) at org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Note that this issue is not coming up if all blocks can be moved inside the DataNodes without having to move the block to an other DataNode. was: In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source class, that is checked during dispatching the moves that the Balancer and the Mover does. This timeout is hardwired to 20 minutes. In the Balancer we have iterations, and even if an iteration is timing out the Balancer runs further and does an other iteration before it fails if there were no moves happened in a few iterations. The Mover on the other hand does not have iterations, so if moving a path runs for more than 20 minutes, after 20 minutes Mover will stop with the following exception reported to the console (lines might differ as this exception came from a CDH5.12.1 installation): java.io.IOException: Block move timed out at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) at org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Attachments: HDFS-13174.001.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, and there are moves decided and enqueued > between two DataNode, after 20 minutes Mover will stop with the following > exception reported to the console (lines might differ as this exception came > from a CDH5.12.1 installation). > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at
[jira] [Commented] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460711#comment-16460711 ] Istvan Fajth commented on HDFS-13174: - Attaching a patch for review. The patch contains some refactoring to make the iteration time configurable. I have added a configuration for the Balancer to control the maximum iteration time, it seemed reasonable, however that might not need to be exposed, in this initial patch I have exposed it. Added a test for Balancer to test the max iteration time is respected, in the test to make it run in a reasonable timeframe with reasonable amount of resources used, I had to use the deprecated DFSConfigKeys.DFS_CLIENT_SOCKET_TIMEOUT_KEY, I am not sure but if there are any better way to control how often the DN gets back to the client to keepalive the connection, I would be glad to know that, this was the only way to affect that, and the newly introduced HdfsClientConfigKeys.DFS_CLIENT_SOCKET_TIMEOUT_KEY is not visible in the test package, and I did not find a way to tune the same in the DN. Added a test for Balancer, if in Dispatcher you set the newly added constructor parameter to a value higher than 0 like for example 200L the test fails because no blocks were moved as the block moves were timed out, this was the case with the previous constant. Updating the Jira description as well as I learned a few things about the issue. > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Attachments: HDFS-13174.001.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, after 20 minutes Mover will stop with the > following exception reported to the console (lines might differ as this > exception came from a CDH5.12.1 installation): > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Fajth updated HDFS-13174: Attachment: HDFS-13174.001.patch > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Attachments: HDFS-13174.001.patch > > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, after 20 minutes Mover will stop with the > following exception reported to the console (lines might differ as this > exception came from a CDH5.12.1 installation): > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13507) RBF: Remove update functionality from routeradmin's add cmd
[ https://issues.apache.org/jira/browse/HDFS-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460693#comment-16460693 ] Yiqun Lin commented on HDFS-13507: -- One comment: Can we add the update command usage in {{HDFSRouterFederation.md}}? This is an incompatible change, we'd be better to complete related document. Others look good to me. > RBF: Remove update functionality from routeradmin's add cmd > --- > > Key: HDFS-13507 > URL: https://issues.apache.org/jira/browse/HDFS-13507 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Wei Yan >Assignee: Gang Li >Priority: Minor > Labels: incompatible > Attachments: HDFS-13507.000.patch, HDFS-13507.001.patch, > HDFS-13507.002.patch > > > Follow up the discussion in HDFS-13326. We should remove the "update" > functionality from routeradmin's add cmd, to make it consistent with RPC > calls. > Note that: this is an incompatible change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13507) RBF: Remove update functionality from routeradmin's add cmd
[ https://issues.apache.org/jira/browse/HDFS-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-13507: - Labels: incompatible (was: ) > RBF: Remove update functionality from routeradmin's add cmd > --- > > Key: HDFS-13507 > URL: https://issues.apache.org/jira/browse/HDFS-13507 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Wei Yan >Assignee: Gang Li >Priority: Minor > Labels: incompatible > Attachments: HDFS-13507.000.patch, HDFS-13507.001.patch, > HDFS-13507.002.patch > > > Follow up the discussion in HDFS-13326. We should remove the "update" > functionality from routeradmin's add cmd, to make it consistent with RPC > calls. > Note that: this is an incompatible change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13488) RBF: Reject requests when a Router is overloaded
[ https://issues.apache.org/jira/browse/HDFS-13488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-13488: - Resolution: Fixed Status: Resolved (was: Patch Available) > RBF: Reject requests when a Router is overloaded > > > Key: HDFS-13488 > URL: https://issues.apache.org/jira/browse/HDFS-13488 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.1 >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.4 > > Attachments: HDFS-13488.000.patch, HDFS-13488.001.patch, > HDFS-13488.002.patch, HDFS-13488.003.patch, HDFS-13488.004.patch > > > A Router might be overloaded when handling special cases (e.g. a slow > subcluster). The Router could reject the requests and the client could try > with another Router. We should leverage the Standby mechanism for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13488) RBF: Reject requests when a Router is overloaded
[ https://issues.apache.org/jira/browse/HDFS-13488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-13488: - Affects Version/s: 3.0.1 Hadoop Flags: Reviewed Target Version/s: 3.2.0, 3.1.1 Fix Version/s: 3.0.4 2.9.2 3.1.1 3.2.0 2.10.0 Committed this to trunk, branch-3.1. branch-3.0 and branch-2 and branch-2.9. Thanks [~elgoiri] for the contribution. > RBF: Reject requests when a Router is overloaded > > > Key: HDFS-13488 > URL: https://issues.apache.org/jira/browse/HDFS-13488 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.1 >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.4 > > Attachments: HDFS-13488.000.patch, HDFS-13488.001.patch, > HDFS-13488.002.patch, HDFS-13488.003.patch, HDFS-13488.004.patch > > > A Router might be overloaded when handling special cases (e.g. a slow > subcluster). The Router could reject the requests and the client could try > with another Router. We should leverage the Standby mechanism for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13488) RBF: Reject requests when a Router is overloaded
[ https://issues.apache.org/jira/browse/HDFS-13488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460618#comment-16460618 ] Hudson commented on HDFS-13488: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14102 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14102/]) HDFS-13488. RBF: Reject requests when a Router is overloaded. (yqlin: rev 37269261d1232bc71708f30c76193188258ef4bd) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/FederationTestUtils.java * (delete) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterSafeModeException.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/FederationRPCPerformanceMonitor.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRPCClientRetries.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcMonitor.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/FederationRPCMetrics.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/StateStoreDFSCluster.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterSafemode.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RBFConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/resources/hdfs-rbf-default.xml * (add) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterClientRejectOverload.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/FederationRPCMBean.java > RBF: Reject requests when a Router is overloaded > > > Key: HDFS-13488 > URL: https://issues.apache.org/jira/browse/HDFS-13488 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13488.000.patch, HDFS-13488.001.patch, > HDFS-13488.002.patch, HDFS-13488.003.patch, HDFS-13488.004.patch > > > A Router might be overloaded when handling special cases (e.g. a slow > subcluster). The Router could reject the requests and the client could try > with another Router. We should leverage the Standby mechanism for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13488) RBF: Reject requests when a Router is overloaded
[ https://issues.apache.org/jira/browse/HDFS-13488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460587#comment-16460587 ] Yiqun Lin commented on HDFS-13488: -- LGTM, +1. Committing. > RBF: Reject requests when a Router is overloaded > > > Key: HDFS-13488 > URL: https://issues.apache.org/jira/browse/HDFS-13488 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13488.000.patch, HDFS-13488.001.patch, > HDFS-13488.002.patch, HDFS-13488.003.patch, HDFS-13488.004.patch > > > A Router might be overloaded when handling special cases (e.g. a slow > subcluster). The Router could reject the requests and the client could try > with another Router. We should leverage the Standby mechanism for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org