Re: [VOTE] Merging branch HDFS-7240 to trunk

Clay B. Thu, 01 Mar 2018 14:46:08 -0800

Oops, retrying now subscribed to more than solely yarn-dev.


-Clay

On Wed, 28 Feb 2018, Clay B. wrote:

+1 (non-binding)
I have walked through the code and find it very compelling as auser; I really look forward to seeing the Ozone code mature andit maturing HDFS features together. The points which excite meas an eight year HDFS user are:
* Excitement for making the datanode a storage technologycontainer - thispatch clearly brings fresh thought to HDFS keeping it fromgrowing stale
* Ability to build upon a shared storage infrastructure fordiverseloads: I do not want to have "stranded" storage capacity orhave tomanage competing storage systems on the same disks (andfurther I wantthe metrics datanodes can provide me today, so I do not havetoinstrument two systems or evolve their instrumentationseparately).
* Looking forward to supporting object-sized files!
* Moves HDFS in the right direction to test out new blockmanagementtechniques for scaling HDFS. I am really excited to see theraftintegration; I hope it opens a new era in Hadoop matchingmodern systemsdesign with new consistency and replication options in ourever
 distributed ecosystem.

-Clay

On Mon, 26 Feb 2018, Jitendra Pandey wrote:
   Dear folks,
We would like to start a vote to merge HDFS-7240branch into trunk. The context can be reviewed in theDISCUSSION thread, and in the jiras (See references below).
HDFS-7240 introduces Hadoop Distributed Storage Layer(HDSL), which is a distributed, replicated block layer.The old HDFS namespace and NN can be connected to this newblock layer as we have described in HDFS-10419.We also introduce a key-value namespace called Ozone builton HDSL.
The code is in a separate module and is turned off bydefault. In a secure setup, HDSL and Ozone daemons cannot bestarted.
   The detailed documentation is available at
            
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Distributed+Storage+Layer+and+Applications


   I will start with my vote.
           +1 (binding)


   Discussion Thread:
             https://s.apache.org/7240-merge
             https://s.apache.org/4sfU

   Jiras:
              https://issues.apache.org/jira/browse/HDFS-7240
              https://issues.apache.org/jira/browse/HDFS-10419
              https://issues.apache.org/jira/browse/HDFS-13074
              https://issues.apache.org/jira/browse/HDFS-13180


   Thanks
   jitendra





           DISCUSSION THREAD SUMMARY :
On 2/13/18, 6:28 PM, "sanjay Radia"<sanjayo...@gmail.com> wrote:
Sorry the formatting got messed by my emailclient. Here it is again
               Dear
                Hadoop Community Members,
We had multiple community discussions, a fewmeetings in smaller groups and also jira discussions withrespect to this thread. We express our gratitude forparticipation and valuable comments.
               The key questions raised were following
1) How the new block storage layer and OzoneFSbenefit HDFS and we were asked to chalk out a roadmap towardsthe goal of a scalable namenode working with the new storagelayer
               2) We were asked to provide a security design
3)There were questions around stability givenozone brings in a large body of code.4) Why can?t they be separate projects foreveror merged in when production ready?
We have responded to all the above questionswith detailed explanations and answers on the jira as well asin the discussions. We believe that should sufficientlyaddress community?s concerns.
               Please see the summary below:
1) The new code base benefits HDFS scaling anda roadmap has been provided.
               Summary:
- New block storage layer addresses thescalability of the block layer. We have shown how existing NNcan be connected to the new block layer and its benefits. Wehave shown 2 milestones, 1st milestone is much simpler than2nd milestone while giving almost the same scaling benefits.Originally we had proposed simply milestone 2 and thecommunity felt that removing the FSN/BM lock was was a fairamount of work and a simpler solution would be useful- We provide a new K-V namespace called OzoneFS with FileSystem/FileContext plugins to allow the users touse the new system. BTW Hive and Spark work very well onKV-namespaces on the cloud. This will facilitate stabilizingthe new block layer.- The new block layer has a new netty basedprotocol engine in the Datanode which, when stabilized, can beused by the old hdfs block layer. See details below onsharing of code.
2) Stability impact on the existing HDFS codebase and code separation. The new block layer and the OzoneFSare in modules that are separate from old HDFS code -currently there are no calls from HDFS into Ozone except forDN starting the new block layer module if configured to doso. It does not add instability (the instability argument hasbeen raised many times). Over time as we share code, we willensure that the old HDFS continues to remains stable. (forexample we plan to stabilize the new netty based protocolengine in the new block layer before sharing it with HDFS?sold block layer)
3) In the short term and medium term, the newsystem and HDFS will be used side-by-side by users. Sideby-side usage in the short term for testing and side-by-sidein the medium term for actual production use till the newsystem has feature parity with old HDFS. During this time,sharing the DN daemon and admin functions between the twosystems is operationally important:- Sharing DN daemon to avoid additionaloperational daemon lifecycle management- Common decommissioning of the daemon andDN: One place to decommission for a node and its storage.- Replacing failed disks and internalbalancing capacity across disks - this needs to be done forboth the current HDFS blocks and the new block-layer blocks.- Balancer: we would like use the samebalancer and provide a common way to balance and commonmanagement of the bandwidth used for balancing- Security configuration setup - reuseexisting set up for DNs rather then a new one for anindependent cluster.
4) Need to easily share the block layer codebetween the two systems when used side-by-side. Areas wheresharing code is desired over time:- Sharing new block layer?s new netty basedprotocol engine for old HDFS DNs (a long time sore issue forHDFS block layer).- Shallow data copy from old system to newsystem is practical only if within same project and daemonotherwise have to deal with security setting and coordinationsacross daemons. Shallow copy is useful as customer migratefrom old to new.- Shared disk scheduling in the future and inthe short term have a single round robin rather thanindependent round robins.While sharing code across projects istechnically possible (anything is possible in software), itis significantly harder typically requiring cleaner publicapis etc. Sharing within a project though internal APIs isoften simpler (such as the protocol engine that we want toshare).
5) Security design, including a threat modeland and the solution has been posted.6) Temporary Separation and merge later:Several of the comments in the jira have argued that wetemporarily separate the two code bases for now and then latermerge them when the new code is stable:
- If there is agreement to merge later, whybother separating now - there needs to be to be good reasonsto separate now. We have addressed the stability andseparation of the new code from existing above.- Merge the new code back into HDFS laterwill be harder.
                   **The code and goals will diverge further.
** We will be taking on extra work to splitand then take extra work to merge.** The issues raised today will be raisedall the same then.
               
---------------------------------------------------------------------
To unsubscribe, e-mail:hdfs-dev-unsubscr...@hadoop.apache.orgFor additional commands, e-mail:hdfs-dev-h...@hadoop.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail:yarn-dev-h...@hadoop.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

Re: [VOTE] Merging branch HDFS-7240 to trunk

Reply via email to