Re: LinkedIn Dynamometer Tool (was About 2.7.4 Release)
Hi Sanjay, Actually I was not aware of that work… This seems to be a better way of achieving some of the same things we do externally to the DN process. I will look into reimplementing some parts on top of this; seems it should just require some very small extensions to DataNodeCluster. Thank you very much for the pointer! Erik On 7/21/17, 11:01 AM, "sanjay Radia"wrote: Erik Great stuff. BTW did you build on top of the “simulated data nodes” in HDFS which has a way to storing only the length of data (but not real data)? That work allowed supplementing with a matching editsLog for the NN. Your approach of using a real image has the advantage of being able to replay traces from audit logs. (Ref https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DataNodeCluster.java) thanks sanjay > On Jul 20, 2017, at 10:42 AM, Erik Krogen wrote: > > forking off of the 2.7.4 release thread to answer this question about > Dynamometer > > Dynamometer is a tool developed at LinkedIn for scale testing HDFS, > specifically the NameNode. We have been using it for some time now and have > recently been making some enhancements to ease of use and reproducibility. > We hope to post a blog post sometime in the not-too-distant future, and > also to open source it. I can provide some details here given that we have > been leveraging it as part of our 2.7.4 release / upgrade process (in > addition to previous upgrades). > > The basic idea is to get full-scale black-box testing of the HDFS NN while > using significantly less (~10%) hardware than a real cluster of that size > would require. We use real NN images from our at-scale clusters paired with > some logic to fake out DNs into thinking they are storing data when they > are not, allowing us to stuff more DNs onto each machine. Since we use a > real image, we can replay real traces (collected from audit logs) to > compare actual production performance vs. performance on this simulated > cluster (with additional tuning, different version, etc.). We leverage YARN > to manage setting up this cluster and to replay the traces. > > Happy to answer questions. > > Erik > > On Wed, Jul 19, 2017 at 5:05 PM, Konstantin Shvachko > wrote: > >> Hi Tianyi, >> >> Glad you are interested in Dynamometer. Erik (CC-ed) is actively working >> on this project right now, I'll let him elaborate. >> Erik, you should probably respond on Apache dev list, as I think it could >> be interesting for other people as well, asince we planned to open source >> it. You can fork the "About 2.7.4 Release" thread with a new subject and >> give some details about Dynamometer there. >> >> Thanks, >> --Konstantin >> >> On Wed, Jul 19, 2017 at 1:40 AM, 何天一 wrote: >> >>> Hi, Shavachko. >>> >>> You mentioned an internal tool called Dynamometer to test NameNode >>> performance earlier in the 2.7.4 release thread. >>> I wonder if you could share some ideas behind the tool. Or is there a >>> plan to bring Dynamometer to open source community? >>> >>> Thanks. >>> >>> BR, >>> Tianyi >>> >>> On Fri, Jul 14, 2017 at 8:45 AM Konstantin Shvachko >>> wrote: >>> Hi everybody. We have been doing some internal testing of Hadoop 2.7.4. The testing is going well. Did not find any major issues on our workloads. Used an internal tool called Dynamometer to check NameNode performance on real cluster traces. Good. Overall test cluster performance looks good. Some more testing is still going on. I plan to build an RC next week. If there are no objection. Thanks, --Konst On Thu, Jun 15, 2017 at 4:42 PM, Konstantin Shvachko < shv.had...@gmail.com> wrote: > Hey guys. > > An update on 2.7.4 progress. > We are down to 4 blockers. There is some work remaining on those. > https://issues.apache.org/jira/browse/HDFS-11896?filter=12340814 > Would be good if people could follow up on review comments. > > I looked through nightly Jenkins build results for 2.7.4 both on Apache > Jenkins and internal. > Some test fail intermittently, but there no consistent failures. I filed > HDFS-11985 to track some of them. > https://issues.apache.org/jira/browse/HDFS-11985 > I do not currently consider these failures as blockers. LMK if some of > them are. > > We started
Re: LinkedIn Dynamometer Tool (was About 2.7.4 Release)
Erik Great stuff. BTW did you build on top of the “simulated data nodes” in HDFS which has a way to storing only the length of data (but not real data)? That work allowed supplementing with a matching editsLog for the NN. Your approach of using a real image has the advantage of being able to replay traces from audit logs. (Ref https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DataNodeCluster.java) thanks sanjay > On Jul 20, 2017, at 10:42 AM, Erik Krogen> wrote: > > forking off of the 2.7.4 release thread to answer this question about > Dynamometer > > Dynamometer is a tool developed at LinkedIn for scale testing HDFS, > specifically the NameNode. We have been using it for some time now and have > recently been making some enhancements to ease of use and reproducibility. > We hope to post a blog post sometime in the not-too-distant future, and > also to open source it. I can provide some details here given that we have > been leveraging it as part of our 2.7.4 release / upgrade process (in > addition to previous upgrades). > > The basic idea is to get full-scale black-box testing of the HDFS NN while > using significantly less (~10%) hardware than a real cluster of that size > would require. We use real NN images from our at-scale clusters paired with > some logic to fake out DNs into thinking they are storing data when they > are not, allowing us to stuff more DNs onto each machine. Since we use a > real image, we can replay real traces (collected from audit logs) to > compare actual production performance vs. performance on this simulated > cluster (with additional tuning, different version, etc.). We leverage YARN > to manage setting up this cluster and to replay the traces. > > Happy to answer questions. > > Erik > > On Wed, Jul 19, 2017 at 5:05 PM, Konstantin Shvachko > wrote: > >> Hi Tianyi, >> >> Glad you are interested in Dynamometer. Erik (CC-ed) is actively working >> on this project right now, I'll let him elaborate. >> Erik, you should probably respond on Apache dev list, as I think it could >> be interesting for other people as well, asince we planned to open source >> it. You can fork the "About 2.7.4 Release" thread with a new subject and >> give some details about Dynamometer there. >> >> Thanks, >> --Konstantin >> >> On Wed, Jul 19, 2017 at 1:40 AM, 何天一 wrote: >> >>> Hi, Shavachko. >>> >>> You mentioned an internal tool called Dynamometer to test NameNode >>> performance earlier in the 2.7.4 release thread. >>> I wonder if you could share some ideas behind the tool. Or is there a >>> plan to bring Dynamometer to open source community? >>> >>> Thanks. >>> >>> BR, >>> Tianyi >>> >>> On Fri, Jul 14, 2017 at 8:45 AM Konstantin Shvachko >>> wrote: >>> Hi everybody. We have been doing some internal testing of Hadoop 2.7.4. The testing is going well. Did not find any major issues on our workloads. Used an internal tool called Dynamometer to check NameNode performance on real cluster traces. Good. Overall test cluster performance looks good. Some more testing is still going on. I plan to build an RC next week. If there are no objection. Thanks, --Konst On Thu, Jun 15, 2017 at 4:42 PM, Konstantin Shvachko < shv.had...@gmail.com> wrote: > Hey guys. > > An update on 2.7.4 progress. > We are down to 4 blockers. There is some work remaining on those. > https://issues.apache.org/jira/browse/HDFS-11896?filter=12340814 > Would be good if people could follow up on review comments. > > I looked through nightly Jenkins build results for 2.7.4 both on Apache > Jenkins and internal. > Some test fail intermittently, but there no consistent failures. I filed > HDFS-11985 to track some of them. > https://issues.apache.org/jira/browse/HDFS-11985 > I do not currently consider these failures as blockers. LMK if some of > them are. > > We started internal testing of branch-2.7 on one of our smallish (100+ > nodes) test clusters. > Will update on the results. > > There is a plan to enable BigTop for 2.7.4 testing. > > Akira, Brahma thank you for setting up a wiki page for 2.7.4 release. > Thank you everybody for contributing to this effort. > > Regards, > --Konstantin > > > On Tue, May 30, 2017 at 12:08 AM, Akira Ajisaka > wrote: > >> Sure. >> If you want to edit the wiki, please tell me your ASF confluence account. >> >> -Akira >> >> On 2017/05/30 15:31, Rohith Sharma K S wrote: >> >>> Couple of more JIRAs need to be back ported for 2.7.4 release. These will >>> solve RM HA unstability issues. >>>
Re: LinkedIn Dynamometer Tool (was About 2.7.4 Release)
Hi Erik, Looking forward to the release of this tool. Thank you very much for the contribution. Had a couple of questions about how the tool works. 1. Would you be able to provide the traces along with this tool? In other words, would I be able to use this out of the box, or do I have to build up traces myself? 2. Could you explain how the “fake out DNs into thinking they are storing data” — works? Or I can be patient and read your blog post too. Thanks Anu On 7/20/17, 10:42 AM, "Erik Krogen"wrote: >forking off of the 2.7.4 release thread to answer this question about >Dynamometer > >Dynamometer is a tool developed at LinkedIn for scale testing HDFS, >specifically the NameNode. We have been using it for some time now and have >recently been making some enhancements to ease of use and reproducibility. >We hope to post a blog post sometime in the not-too-distant future, and >also to open source it. I can provide some details here given that we have >been leveraging it as part of our 2.7.4 release / upgrade process (in >addition to previous upgrades). > >The basic idea is to get full-scale black-box testing of the HDFS NN while >using significantly less (~10%) hardware than a real cluster of that size >would require. We use real NN images from our at-scale clusters paired with >some logic to fake out DNs into thinking they are storing data when they >are not, allowing us to stuff more DNs onto each machine. Since we use a >real image, we can replay real traces (collected from audit logs) to >compare actual production performance vs. performance on this simulated >cluster (with additional tuning, different version, etc.). We leverage YARN >to manage setting up this cluster and to replay the traces. > >Happy to answer questions. > >Erik > >On Wed, Jul 19, 2017 at 5:05 PM, Konstantin Shvachko >wrote: > >> Hi Tianyi, >> >> Glad you are interested in Dynamometer. Erik (CC-ed) is actively working >> on this project right now, I'll let him elaborate. >> Erik, you should probably respond on Apache dev list, as I think it could >> be interesting for other people as well, asince we planned to open source >> it. You can fork the "About 2.7.4 Release" thread with a new subject and >> give some details about Dynamometer there. >> >> Thanks, >> --Konstantin >> >> On Wed, Jul 19, 2017 at 1:40 AM, 何天一 wrote: >> >>> Hi, Shavachko. >>> >>> You mentioned an internal tool called Dynamometer to test NameNode >>> performance earlier in the 2.7.4 release thread. >>> I wonder if you could share some ideas behind the tool. Or is there a >>> plan to bring Dynamometer to open source community? >>> >>> Thanks. >>> >>> BR, >>> Tianyi >>> >>> On Fri, Jul 14, 2017 at 8:45 AM Konstantin Shvachko >>> wrote: >>> Hi everybody. We have been doing some internal testing of Hadoop 2.7.4. The testing is going well. Did not find any major issues on our workloads. Used an internal tool called Dynamometer to check NameNode performance on real cluster traces. Good. Overall test cluster performance looks good. Some more testing is still going on. I plan to build an RC next week. If there are no objection. Thanks, --Konst On Thu, Jun 15, 2017 at 4:42 PM, Konstantin Shvachko < shv.had...@gmail.com> wrote: > Hey guys. > > An update on 2.7.4 progress. > We are down to 4 blockers. There is some work remaining on those. > https://issues.apache.org/jira/browse/HDFS-11896?filter=12340814 > Would be good if people could follow up on review comments. > > I looked through nightly Jenkins build results for 2.7.4 both on Apache > Jenkins and internal. > Some test fail intermittently, but there no consistent failures. I filed > HDFS-11985 to track some of them. > https://issues.apache.org/jira/browse/HDFS-11985 > I do not currently consider these failures as blockers. LMK if some of > them are. > > We started internal testing of branch-2.7 on one of our smallish (100+ > nodes) test clusters. > Will update on the results. > > There is a plan to enable BigTop for 2.7.4 testing. > > Akira, Brahma thank you for setting up a wiki page for 2.7.4 release. > Thank you everybody for contributing to this effort. > > Regards, > --Konstantin > > > On Tue, May 30, 2017 at 12:08 AM, Akira Ajisaka > wrote: > >> Sure. >> If you want to edit the wiki, please tell me your ASF confluence account. >> >> -Akira >> >> On 2017/05/30 15:31, Rohith Sharma K S wrote: >> >>> Couple of more JIRAs need to be back ported for 2.7.4 release. These will >>> solve RM HA unstability issues. >>> https://issues.apache.org/jira/browse/YARN-5333 >>>