[
https://issues.apache.org/jira/browse/HDDS-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arpit Agarwal updated HDDS-2447:
--------------------------------
Target Version/s: 0.6.0 (was: 0.5.0)
Labels: TriagePending (was: )
> Allow datanodes to operate with simulated containers
> ----------------------------------------------------
>
> Key: HDDS-2447
> URL: https://issues.apache.org/jira/browse/HDDS-2447
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Datanode
> Affects Versions: 0.5.0
> Reporter: Stephen O'Donnell
> Priority: Major
> Labels: TriagePending
>
> The Storage Container Manager (SCM) generally deals with datanodes and
> containers. Datanodes report their containers via container reports and the
> SCM keeps track of them, schedules new replicas to be created when needed
> etc. SCM does not care about individual blocks within the containers (aside
> from deleting them) or keys. Therefore it should be possible to scale test
> much of SCM without OM or worrying about writing keys.
> In order to scale test SCM and some of its internal features like like
> decommission, maintenance mode and the replication manager, it would be
> helpful to quickly create clusters with many containers, without needing to
> go through a data loading exercise.
> What I imagine happening is:
> * We generate a list of container IDs and container sizes - this could be a
> fixed size or configured size for all containers. We could also fix the
> number of blocks / chunks inside a 'generated simulated container' so they
> are all the same.
> * When the Datanode starts, if it has simulated containers enabled, it would
> optionally look for this list of containers and load the meta data into
> memory. Then it would report the containers to SCM as normal, and the SCM
> would believe the containers actually exist.
> * If SCM creates a new container, then the datanode should create the
> meta-data in memory, but not write anything to disk.
> * If SCM instructs a DN to replicate a container, then we should stream
> simulated data over the wire equivalent to the container size, but again
> throw away the data at the receiving side and store only the metadata in
> datanode memory.
> * It would be acceptable for a DN restart to forget all containers and
> re-load them from the generated list. A nice-to-have feature would persist
> any changes to disk somehow so a DN restart would return to its pre-restart
> state.
> At this stage, I am not too concerned about OM, or clients trying to read
> chunks out of these simulated containers (my focus is on SCM at the moment),
> but it would be great if that were possible too.
> I believe this feature would let us do scale testing of SCM and benchmark
> some dead node / replication / decommission scenarios on clusters with much
> reduced hardware requirements.
> It would also allow clusters with a large number of containers to be created
> quickly, rather than going through a dataload exercise.
> This would open the door to a tool similar to
> https://github.com/linkedin/dynamometer which uses simulated storage on HDFS
> to perform scale tests against the namenode with reduced hardware
> requirements.
> HDDS-1094 added the ability to have a level of simulated storage on a
> datanode. In that Jira, when a client writes data to a chunk the data is
> thrown away and nothing is written to disk. If a client later tries to read
> the data back, it just gets zeroed byte buffers. Hopefully this Jira could
> build on that feature to fully simulate the containers from the SCM point of
> view and later we can extend to allowing clients to create keys etc too.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]