[jira] [Created] (HDFS-14974) RBF: TestRouterSecurityManager#testCreateCredentials should use :0 for port
Íñigo Goiri created HDFS-14974: -- Summary: RBF: TestRouterSecurityManager#testCreateCredentials should use :0 for port Key: HDFS-14974 URL: https://issues.apache.org/jira/browse/HDFS-14974 Project: Hadoop HDFS Issue Type: Improvement Reporter: Íñigo Goiri Currently, {{TestRouterSecurityManager#testCreateCredentials}} create a Router with the default ports. However, these ports might be used. We should set it to :0 for it to be assigned dynamically. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/500/ No changes - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2104) Refactor OMFailoverProxyProvider#loadOMClientConfigs
[ https://issues.apache.org/jira/browse/HDDS-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siyao Meng resolved HDDS-2104. -- Resolution: Fixed > Refactor OMFailoverProxyProvider#loadOMClientConfigs > > > Key: HDDS-2104 > URL: https://issues.apache.org/jira/browse/HDDS-2104 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > > Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321586979 > Now that we decide to use client-side configuration for OM HA, some logic in > OMFailoverProxyProvider#loadOMClientConfigs becomes redundant. > The work will begin after HDDS-2007 is committed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2454) Improve OM HA robot tests
Hanisha Koneru created HDDS-2454: Summary: Improve OM HA robot tests Key: HDDS-2454 URL: https://issues.apache.org/jira/browse/HDDS-2454 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Hanisha Koneru Assignee: Hanisha Koneru In one CI run, testOMHA.robot failed because robot framework SSH commands failed. This Jira aims to verify that the command execution succeeds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Next Wednesday (Nov 13) Hadoop storage online community sync
Hi, I am happy to invite Zhenyu to join us to talk about the recent proposal of supporting ARM/aarch64 for Hadoop. November 13 (Wednesday) US Pacific Time 10am / November 13 (Wednesday) Bangalore 11:30pm) / November 14 (Thursday) Beijing 2am. Previous meeting notes: https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit Access via Zoom: https://cloudera.zoom.us/j/880548968 One tap mobile +16465588656,,880548968# US (New York) +17207072699,,880548968# US Dial by your location +1 646 558 8656 US (New York) +1 720 707 2699 US 877 853 5257 US Toll-free 888 475 4499 US Toll-free Meeting ID: 880 548 968 Find your local number: https://zoom.us/u/acaGRDfMVl
[jira] [Created] (HDDS-2453) Add Freon tests for S3Bucket/MPU Keys
Bharat Viswanadham created HDDS-2453: Summary: Add Freon tests for S3Bucket/MPU Keys Key: HDDS-2453 URL: https://issues.apache.org/jira/browse/HDDS-2453 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham This Jira is to create freon tests for # S3Bucket creation. # S3 MPU Key uploads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2410) Ozoneperf docker cluster should use privileged containers
[ https://issues.apache.org/jira/browse/HDDS-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2410. -- Fix Version/s: 0.5.0 Resolution: Fixed > Ozoneperf docker cluster should use privileged containers > - > > Key: HDDS-2410 > URL: https://issues.apache.org/jira/browse/HDDS-2410 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The profiler > [servlet|https://github.com/elek/hadoop-ozone/blob/master/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/ProfileServlet.java] > (which helps to run java profiler in the background and publishes the result > on the web interface) requires privileged docker containers. > > This flag is missing from the ozoneperf docker-compose cluster (which is > designed to run performance tests). > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2452) Wrong condition for re-scheduling in ReportPublisher
Attila Doroszlai created HDDS-2452: -- Summary: Wrong condition for re-scheduling in ReportPublisher Key: HDDS-2452 URL: https://issues.apache.org/jira/browse/HDDS-2452 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Attila Doroszlai It seems the condition for scheduling next run of {{ReportPublisher}} is wrong: {code:title=https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/report/ReportPublisher.java#L74-L76} if (!executor.isShutdown() || !(context.getState() == DatanodeStates.SHUTDOWN)) { executor.schedule(this, {code} Given the condition above, the task may be scheduled again if the executor is shutdown, but the state machine is not set to shutdown (or vice versa). (Currently it is unlikely to happen, since [context state is set to shutdown before the report executor|https://github.com/apache/hadoop-ozone/blob/f928a0bdb4ea2e5195da39256c6dda9f1c855649/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java#L392-L393].) [~nanda], can you please confirm if this is a typo or intentional? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2451) Use lazy string evaluation in preconditions
Attila Doroszlai created HDDS-2451: -- Summary: Use lazy string evaluation in preconditions Key: HDDS-2451 URL: https://issues.apache.org/jira/browse/HDDS-2451 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Attila Doroszlai Assignee: Attila Doroszlai Avoid eagerly evaluating error messages of preconditions (similarly to HDDS-2318, but there may be other occurrences of the same issue). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly
Erik Krogen created HDFS-14973: -- Summary: Balancer getBlocks RPC dispersal does not function properly Key: HDFS-14973 URL: https://issues.apache.org/jira/browse/HDFS-14973 Project: Hadoop HDFS Issue Type: Bug Components: balancer & mover Affects Versions: 3.0.0, 2.8.2, 2.7.4, 2.9.0 Reporter: Erik Krogen Assignee: Erik Krogen In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls issued by the Balancer/Mover more dispersed, to alleviate load on the NameNode, since {{getBlocks}} can be very expensive and the Balancer should not impact normal cluster operation. Unfortunately, this functionality does not function as expected, especially when the dispatcher thread count is low. The primary issue is that the delay is applied only to the first N threads that are submitted to the dispatcher's executor, where N is the size of the dispatcher's threadpool, but *not* to the first R threads, where R is the number of allowed {{getBlocks}} QPS (currently hardcoded to 20). For example, if the threadpool size is 100 (the default), threads 0-19 have no delay, 20-99 have increased levels of delay, and 100+ have no delay. As I understand it, the intent of the logic was that the delay applied to the first 100 threads would force the dispatcher executor's threads to all be consumed, thus blocking subsequent (non-delayed) threads until the delay period has expired. However, threads 0-19 can finish very quickly (their work can often be fulfilled in the time it takes to execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), thus opening up 20 new slots in the executor, which are then consumed by non-delayed threads 100-119, and so on. So, although 80 threads have had a delay applied, the non-delay threads rush through in the 20 non-delay slots. This problem gets even worse when the dispatcher threadpool size is less than the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no threads ever have a delay applied_, and the feature is not enabled at all. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2449) Delete block command should use a thread pool
Stephen O'Donnell created HDDS-2449: --- Summary: Delete block command should use a thread pool Key: HDDS-2449 URL: https://issues.apache.org/jira/browse/HDDS-2449 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode Affects Versions: 0.5.0 Reporter: Stephen O'Donnell Assignee: Stephen O'Donnell The datanode receives commands over the heartbeat and queues all commands on a single queue in StateContext.commandQueue. Inside DatanodeStateMachine a single thread is used to process this queue (started by initCommandHander thread) and it passes each command to a ‘handler’. Each command type has its own handler. The delete block command immediately executes the command on the thread used to process the command queue. Therefore if the delete is slow for some reason (it must access disk, so this is possible) it could cause other commands to backup. This should be changed to use a threadpool to queue the deleteBlock command, in a similar way to ReplicateContainerCommand. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2450) Datanode ReplicateContainer thread pool should be configurable
Stephen O'Donnell created HDDS-2450: --- Summary: Datanode ReplicateContainer thread pool should be configurable Key: HDDS-2450 URL: https://issues.apache.org/jira/browse/HDDS-2450 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode Affects Versions: 0.5.0 Reporter: Stephen O'Donnell Assignee: Stephen O'Donnell The replicateContainer command uses a ReplicationSupervisor object to implement a threadpool used to process replication commands. In DatanodeStateMachine this thread pool is initialized with a hard coded number of threads (10). This should be made configurable with a default value of 10. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2448) Delete container command should used a thread pool
Stephen O'Donnell created HDDS-2448: --- Summary: Delete container command should used a thread pool Key: HDDS-2448 URL: https://issues.apache.org/jira/browse/HDDS-2448 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode Affects Versions: 0.5.0 Reporter: Stephen O'Donnell Assignee: Stephen O'Donnell The datanode receives commands over the heartbeat and queues all commands on a single queue in StateContext.commandQueue. Inside DatanodeStateMachine a single thread is used to process this queue (started by initCommandHander thread) and it passes each command to a ‘handler’. Each command type has its own handler. The delete container command immediately executes the command on the thread used to process the command queue. Therefore if the delete is slow for some reason (it must access disk, so this is possible) it could cause other commands to backup. This should be changed to use a threadpool to queue the deleteContainer command, in a similar way to ReplicateContainerCommand. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1701) Move dockerbin script to libexec
[ https://issues.apache.org/jira/browse/HDDS-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek resolved HDDS-1701. --- Fix Version/s: 0.5.0 Resolution: Fixed > Move dockerbin script to libexec > > > Key: HDDS-1701 > URL: https://issues.apache.org/jira/browse/HDDS-1701 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Eric Yang >Assignee: YiSheng Lien >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Ozone tarball structure contains a new bin script directory called dockerbin. > These utility script can be relocated to OZONE_HOME/libexec because they are > internal binaries that are not intended to be executed directly by users or > shell scripts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14972) HDFS: fsck "-blockId" option not giving expected output
Souryakanta Dwivedy created HDFS-14972: -- Summary: HDFS: fsck "-blockId" option not giving expected output Key: HDFS-14972 URL: https://issues.apache.org/jira/browse/HDFS-14972 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.1.2 Environment: HA Cluster Reporter: Souryakanta Dwivedy Attachments: image-2019-11-08-19-10-18-057.png, image-2019-11-08-19-12-21-307.png HDFS: fsck "-blockId" option not giving expected output HDFS fsck displaying correct output for corrupted files and blocks !image-2019-11-08-19-10-18-057.png! HDFS fsck -blockId command not giving expected output for corrupted replica !image-2019-11-08-19-12-21-307.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14971) HDFS : help info of fsck "-list-corruptfileblocks" command needs to be rectified
Souryakanta Dwivedy created HDFS-14971: -- Summary: HDFS : help info of fsck "-list-corruptfileblocks" command needs to be rectified Key: HDFS-14971 URL: https://issues.apache.org/jira/browse/HDFS-14971 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 3.1.2 Environment: HA Cluster Reporter: Souryakanta Dwivedy Attachments: image-2019-11-08-18-58-41-220.png HDFS : help info of fsck "-list-corruptfileblocks" command needs to be rectified Check the help info of fsck -list-corruptfileblocks it is specified as "-list-corruptfileblocks print out list of missing blocks and files they belong to" It should be rectified as corrupted blocks and files as it is going provide information about corrupted blocks and files not missing blocks and files Expected output :- "-list-corruptfileblocks print out list of corrupted blocks and files they belong to" !image-2019-11-08-18-58-41-220.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14970) HDFS : fsck "-list-corruptfileblocks" command not giving expected output
Souryakanta Dwivedy created HDFS-14970: -- Summary: HDFS : fsck "-list-corruptfileblocks" command not giving expected output Key: HDFS-14970 URL: https://issues.apache.org/jira/browse/HDFS-14970 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.1.2 Environment: HA Cluster Reporter: Souryakanta Dwivedy Attachments: image-2019-11-08-18-44-03-349.png, image-2019-11-08-18-45-53-858.png HDFS fsck "-list-corruptfileblocks" option not giving expected output Step :- Check the currupt files with fsck it will give the correct output !image-2019-11-08-18-44-03-349.png! Check the currupt files with fsck -list-corruptfileblocks option it will not provide the expected output which is wrong behavior !image-2019-11-08-18-45-53-858.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2447) Allow datanodes to operate with simulated containers
Stephen O'Donnell created HDDS-2447: --- Summary: Allow datanodes to operate with simulated containers Key: HDDS-2447 URL: https://issues.apache.org/jira/browse/HDDS-2447 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Affects Versions: 0.5.0 Reporter: Stephen O'Donnell The Storage Container Manager (SCM) generally deals with datanodes and containers. Datanodes report their containers via container reports and the SCM keeps track of them, schedules new replicas to be created when needed etc. SCM does not care about individual blocks within the containers (aside from deleting them) or keys. Therefore it should be possible to scale test much of SCM without OM or worrying about writing keys. In order to scale test SCM and some of its internal features like like decommission, maintenance mode and the replication manager, it would be helpful to quickly create clusters with many containers, without needing to go through a data loading exercise. What I imagine happening is: * We generate a list of container IDs and container sizes - this could be a fixed size or configured size for all containers. We could also fix the number of blocks / chunks inside a 'generated simulated container' so they are all the same. * When the Datanode starts, if it has simulated containers enabled, it would optionally look for this list of containers and load the meta data into memory. Then it would report the containers to SCM as normal, and the SCM would believe the containers actually exist. * If SCM creates a new container, then the datanode should create the meta-data in memory, but not write anything to disk. * If SCM instructs a DN to replicate a container, then we should stream simulated data over the wire equivalent to the container size, but again throw away the data at the receiving side and store only the metadata in datanode memory. * It would be acceptable for a DN restart to forget all containers and re-load them from the generated list. A nice-to-have feature would persist any changes to disk somehow so a DN restart would return to its pre-restart state. At this stage, I am not too concerned about OM, or clients trying to read chunks out of these simulated containers (my focus is on SCM at the moment), but it would be great if that were possible too. I believe this feature would let us do scale testing of SCM and benchmark some dead node / replication / decommission scenarios on clusters with much reduced hardware requirements. It would also allow clusters with a large number of containers to be created quickly, rather than going through a dataload exercise. This would open the door to a tool similar to https://github.com/linkedin/dynamometer which uses simulated storage on HDFS to perform scale tests against the namenode with reduced hardware requirements. HDDS-1094 added the ability to have a level of simulated storage on a datanode. In that Jira, when a client writes data to a chunk the data is thrown away and nothing is written to disk. If a client later tries to read the data back, it just gets zeroed byte buffers. Hopefully this Jira could build on that feature to fully simulate the containers from the SCM point of view and later we can extend to allowing clients to create keys etc too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails
Stephen O'Donnell created HDDS-2446: --- Summary: ContainerReplica should contain DatanodeInfo rather than DatanodeDetails Key: HDDS-2446 URL: https://issues.apache.org/jira/browse/HDDS-2446 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: SCM Affects Versions: 0.5.0 Reporter: Stephen O'Donnell Assignee: Stephen O'Donnell The ContainerReplica object is used by the SCM to track containers reported by the datanodes. The current fields stored in ContainerReplica are: {code} final private ContainerID containerID; final private ContainerReplicaProto.State state; final private DatanodeDetails datanodeDetails; final private UUID placeOfBirth; {code} Now we have introduced decommission and maintenance mode, the replication manager (and potentially other parts of the code) need to know the status of the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to make replication decisions. The DatanodeDetails object does not carry this information, however the DatanodeInfo object extends DatanodeDetails and does carry the required information. As DatanodeInfo extends DatanodeDetails, any place which needs a DatanodeDetails can accept a DatanodeInfo instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2445) Replace ToStringBuilder in BlockData
Attila Doroszlai created HDDS-2445: -- Summary: Replace ToStringBuilder in BlockData Key: HDDS-2445 URL: https://issues.apache.org/jira/browse/HDDS-2445 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Attila Doroszlai Assignee: Attila Doroszlai {{BlockData#toString}} uses {{ToStringBuilder}} for ease of implementation. This has a few problems: # {{ToStringBuilder}} uses {{StringBuffer}}, which is synchronized # the default buffer is 512 bytes, more than needed here # {{BlockID}} and {{ContainerBlockID}} both use another {{StringBuilder}} or {{StringBuffer}} for their {{toString}} implementation, leading to several allocations and copies The flame graph shows that {{BlockData#toString}} may be responsible for 1.5% of total allocations while putting keys. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2444) Remove server side dependencies from ozonefs jar files
Marton Elek created HDDS-2444: - Summary: Remove server side dependencies from ozonefs jar files Key: HDDS-2444 URL: https://issues.apache.org/jira/browse/HDDS-2444 Project: Hadoop Distributed Data Store Issue Type: Task Components: Ozone Filesystem Reporter: Marton Elek During the review of HDDS-2427 we found that some of the server side dependencies (container-service, framework) are added to the ozonefs library jars. Server side dependencies should be excluded from the client side to make the client safer and the build faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org