Re: LinkedIn Dynamometer Tool (was About 2.7.4 Release)

2017-07-21 Thread Erik Krogen
Hi Sanjay,

Actually I was not aware of that work… This seems to be a better way of 
achieving some of the same things we do externally to the DN process. I will 
look into reimplementing some parts on top of this; seems it should just 
require some very small extensions to DataNodeCluster. Thank you very much for 
the pointer!

Erik

On 7/21/17, 11:01 AM, "sanjay Radia"  wrote:

Erik
  Great stuff. 
BTW did you build on top of the “simulated data nodes” in HDFS which has a 
way to storing only the length of data (but not real data)? That work allowed 
supplementing  with a matching editsLog for the NN. Your approach of using a 
real image has the advantage of being able to replay traces from audit logs.
(Ref 
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DataNodeCluster.java)

thanks

sanjay
> On Jul 20, 2017, at 10:42 AM, Erik Krogen  
wrote:
> 
> forking off of the 2.7.4 release thread to answer this question about
> Dynamometer
> 
> Dynamometer is a tool developed at LinkedIn for scale testing HDFS,
> specifically the NameNode. We have been using it for some time now and 
have
> recently been making some enhancements to ease of use and reproducibility.
> We hope to post a blog post sometime in the not-too-distant future, and
> also to open source it. I can provide some details here given that we have
> been leveraging it as part of our 2.7.4 release / upgrade process (in
> addition to previous upgrades).
> 
> The basic idea is to get full-scale black-box testing of the HDFS NN while
> using significantly less (~10%) hardware than a real cluster of that size
> would require. We use real NN images from our at-scale clusters paired 
with
> some logic to fake out DNs into thinking they are storing data when they
> are not, allowing us to stuff more DNs onto each machine. Since we use a
> real image, we can replay real traces (collected from audit logs) to
> compare actual production performance vs. performance on this simulated
> cluster (with additional tuning, different version, etc.). We leverage 
YARN
> to manage setting up this cluster and to replay the traces.
> 
> Happy to answer questions.
> 
> Erik
> 
> On Wed, Jul 19, 2017 at 5:05 PM, Konstantin Shvachko 

> wrote:
> 
>> Hi Tianyi,
>> 
>> Glad you are interested in Dynamometer. Erik (CC-ed) is actively working
>> on this project right now, I'll let him elaborate.
>> Erik, you should probably respond on Apache dev list, as I think it could
>> be interesting for other people as well, asince we planned to open source
>> it. You can fork the "About 2.7.4 Release" thread with a new subject and
>> give some details about Dynamometer there.
>> 
>> Thanks,
>> --Konstantin
>> 
>> On Wed, Jul 19, 2017 at 1:40 AM, 何天一  wrote:
>> 
>>> Hi, Shavachko.
>>> 
>>> You mentioned an internal tool called Dynamometer to test NameNode
>>> performance earlier in the 2.7.4 release thread.
>>> I wonder if you could share some ideas behind the tool. Or is there a
>>> plan to bring Dynamometer to open source community?
>>> 
>>> Thanks.
>>> 
>>> BR,
>>> Tianyi
>>> 
>>> On Fri, Jul 14, 2017 at 8:45 AM Konstantin Shvachko 

>>> wrote:
>>> 
 Hi everybody.
 
 We have been doing some internal testing of Hadoop 2.7.4. The testing 
is
 going well.
 Did not find any major issues on our workloads.
 Used an internal tool called Dynamometer to check NameNode performance 
on
 real cluster traces. Good.
 Overall test cluster performance looks good.
 Some more testing is still going on.
 
 I plan to build an RC next week. If there are no objection.
 
 Thanks,
 --Konst
 
 On Thu, Jun 15, 2017 at 4:42 PM, Konstantin Shvachko <
 shv.had...@gmail.com>
 wrote:
 
> Hey guys.
> 
> An update on 2.7.4 progress.
> We are down to 4 blockers. There is some work remaining on those.
> https://issues.apache.org/jira/browse/HDFS-11896?filter=12340814
> Would be good if people could follow up on review comments.
> 
> I looked through nightly Jenkins build results for 2.7.4 both on 
Apache
> Jenkins and internal.
> Some test fail intermittently, but there no consistent failures. I
 filed
> HDFS-11985 to track some of them.
> https://issues.apache.org/jira/browse/HDFS-11985
> I do not currently consider these failures as blockers. LMK if some of
> them are.
> 
> We started 

Re: LinkedIn Dynamometer Tool (was About 2.7.4 Release)

2017-07-21 Thread sanjay Radia
Erik
  Great stuff. 
BTW did you build on top of the “simulated data nodes” in HDFS which has a way 
to storing only the length of data (but not real data)? That work allowed 
supplementing  with a matching editsLog for the NN. Your approach of using a 
real image has the advantage of being able to replay traces from audit logs.
(Ref 
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DataNodeCluster.java)

thanks

sanjay
> On Jul 20, 2017, at 10:42 AM, Erik Krogen  
> wrote:
> 
> forking off of the 2.7.4 release thread to answer this question about
> Dynamometer
> 
> Dynamometer is a tool developed at LinkedIn for scale testing HDFS,
> specifically the NameNode. We have been using it for some time now and have
> recently been making some enhancements to ease of use and reproducibility.
> We hope to post a blog post sometime in the not-too-distant future, and
> also to open source it. I can provide some details here given that we have
> been leveraging it as part of our 2.7.4 release / upgrade process (in
> addition to previous upgrades).
> 
> The basic idea is to get full-scale black-box testing of the HDFS NN while
> using significantly less (~10%) hardware than a real cluster of that size
> would require. We use real NN images from our at-scale clusters paired with
> some logic to fake out DNs into thinking they are storing data when they
> are not, allowing us to stuff more DNs onto each machine. Since we use a
> real image, we can replay real traces (collected from audit logs) to
> compare actual production performance vs. performance on this simulated
> cluster (with additional tuning, different version, etc.). We leverage YARN
> to manage setting up this cluster and to replay the traces.
> 
> Happy to answer questions.
> 
> Erik
> 
> On Wed, Jul 19, 2017 at 5:05 PM, Konstantin Shvachko 
> wrote:
> 
>> Hi Tianyi,
>> 
>> Glad you are interested in Dynamometer. Erik (CC-ed) is actively working
>> on this project right now, I'll let him elaborate.
>> Erik, you should probably respond on Apache dev list, as I think it could
>> be interesting for other people as well, asince we planned to open source
>> it. You can fork the "About 2.7.4 Release" thread with a new subject and
>> give some details about Dynamometer there.
>> 
>> Thanks,
>> --Konstantin
>> 
>> On Wed, Jul 19, 2017 at 1:40 AM, 何天一  wrote:
>> 
>>> Hi, Shavachko.
>>> 
>>> You mentioned an internal tool called Dynamometer to test NameNode
>>> performance earlier in the 2.7.4 release thread.
>>> I wonder if you could share some ideas behind the tool. Or is there a
>>> plan to bring Dynamometer to open source community?
>>> 
>>> Thanks.
>>> 
>>> BR,
>>> Tianyi
>>> 
>>> On Fri, Jul 14, 2017 at 8:45 AM Konstantin Shvachko 
>>> wrote:
>>> 
 Hi everybody.
 
 We have been doing some internal testing of Hadoop 2.7.4. The testing is
 going well.
 Did not find any major issues on our workloads.
 Used an internal tool called Dynamometer to check NameNode performance on
 real cluster traces. Good.
 Overall test cluster performance looks good.
 Some more testing is still going on.
 
 I plan to build an RC next week. If there are no objection.
 
 Thanks,
 --Konst
 
 On Thu, Jun 15, 2017 at 4:42 PM, Konstantin Shvachko <
 shv.had...@gmail.com>
 wrote:
 
> Hey guys.
> 
> An update on 2.7.4 progress.
> We are down to 4 blockers. There is some work remaining on those.
> https://issues.apache.org/jira/browse/HDFS-11896?filter=12340814
> Would be good if people could follow up on review comments.
> 
> I looked through nightly Jenkins build results for 2.7.4 both on Apache
> Jenkins and internal.
> Some test fail intermittently, but there no consistent failures. I
 filed
> HDFS-11985 to track some of them.
> https://issues.apache.org/jira/browse/HDFS-11985
> I do not currently consider these failures as blockers. LMK if some of
> them are.
> 
> We started internal testing of branch-2.7 on one of our smallish (100+
> nodes) test clusters.
> Will update on the results.
> 
> There is a plan to enable BigTop for 2.7.4 testing.
> 
> Akira, Brahma thank you for setting up a wiki page for 2.7.4 release.
> Thank you everybody for contributing to this effort.
> 
> Regards,
> --Konstantin
> 
> 
> On Tue, May 30, 2017 at 12:08 AM, Akira Ajisaka 
> wrote:
> 
>> Sure.
>> If you want to edit the wiki, please tell me your ASF confluence
 account.
>> 
>> -Akira
>> 
>> On 2017/05/30 15:31, Rohith Sharma K S wrote:
>> 
>>> Couple of more JIRAs need to be back ported for 2.7.4 release. These
 will
>>> solve RM HA unstability issues.
>>> 

Re: LinkedIn Dynamometer Tool (was About 2.7.4 Release)

2017-07-20 Thread Anu Engineer
Hi Erik,

Looking forward to the release of this tool. Thank you very much for the 
contribution.

Had a couple of questions about how the tool works.

1. Would you be able to provide the traces along with this tool? In other 
words, would I be able to use this out of the box, or do I have to build up 
traces myself? 

2. Could you explain how the “fake out DNs into thinking they are storing data” 
— works? Or I can be patient and read your blog post too.

Thanks
Anu






On 7/20/17, 10:42 AM, "Erik Krogen"  wrote:

>forking off of the 2.7.4 release thread to answer this question about
>Dynamometer
>
>Dynamometer is a tool developed at LinkedIn for scale testing HDFS,
>specifically the NameNode. We have been using it for some time now and have
>recently been making some enhancements to ease of use and reproducibility.
>We hope to post a blog post sometime in the not-too-distant future, and
>also to open source it. I can provide some details here given that we have
>been leveraging it as part of our 2.7.4 release / upgrade process (in
>addition to previous upgrades).
>
>The basic idea is to get full-scale black-box testing of the HDFS NN while
>using significantly less (~10%) hardware than a real cluster of that size
>would require. We use real NN images from our at-scale clusters paired with
>some logic to fake out DNs into thinking they are storing data when they
>are not, allowing us to stuff more DNs onto each machine. Since we use a
>real image, we can replay real traces (collected from audit logs) to
>compare actual production performance vs. performance on this simulated
>cluster (with additional tuning, different version, etc.). We leverage YARN
>to manage setting up this cluster and to replay the traces.
>
>Happy to answer questions.
>
>Erik
>
>On Wed, Jul 19, 2017 at 5:05 PM, Konstantin Shvachko 
>wrote:
>
>> Hi Tianyi,
>>
>> Glad you are interested in Dynamometer. Erik (CC-ed) is actively working
>> on this project right now, I'll let him elaborate.
>> Erik, you should probably respond on Apache dev list, as I think it could
>> be interesting for other people as well, asince we planned to open source
>> it. You can fork the "About 2.7.4 Release" thread with a new subject and
>> give some details about Dynamometer there.
>>
>> Thanks,
>> --Konstantin
>>
>> On Wed, Jul 19, 2017 at 1:40 AM, 何天一  wrote:
>>
>>> Hi, Shavachko.
>>>
>>> You mentioned an internal tool called Dynamometer to test NameNode
>>> performance earlier in the 2.7.4 release thread.
>>> I wonder if you could share some ideas behind the tool. Or is there a
>>> plan to bring Dynamometer to open source community?
>>>
>>> Thanks.
>>>
>>> BR,
>>> Tianyi
>>>
>>> On Fri, Jul 14, 2017 at 8:45 AM Konstantin Shvachko 
>>> wrote:
>>>
 Hi everybody.

 We have been doing some internal testing of Hadoop 2.7.4. The testing is
 going well.
 Did not find any major issues on our workloads.
 Used an internal tool called Dynamometer to check NameNode performance on
 real cluster traces. Good.
 Overall test cluster performance looks good.
 Some more testing is still going on.

 I plan to build an RC next week. If there are no objection.

 Thanks,
 --Konst

 On Thu, Jun 15, 2017 at 4:42 PM, Konstantin Shvachko <
 shv.had...@gmail.com>
 wrote:

 > Hey guys.
 >
 > An update on 2.7.4 progress.
 > We are down to 4 blockers. There is some work remaining on those.
 > https://issues.apache.org/jira/browse/HDFS-11896?filter=12340814
 > Would be good if people could follow up on review comments.
 >
 > I looked through nightly Jenkins build results for 2.7.4 both on Apache
 > Jenkins and internal.
 > Some test fail intermittently, but there no consistent failures. I
 filed
 > HDFS-11985 to track some of them.
 > https://issues.apache.org/jira/browse/HDFS-11985
 > I do not currently consider these failures as blockers. LMK if some of
 > them are.
 >
 > We started internal testing of branch-2.7 on one of our smallish (100+
 > nodes) test clusters.
 > Will update on the results.
 >
 > There is a plan to enable BigTop for 2.7.4 testing.
 >
 > Akira, Brahma thank you for setting up a wiki page for 2.7.4 release.
 > Thank you everybody for contributing to this effort.
 >
 > Regards,
 > --Konstantin
 >
 >
 > On Tue, May 30, 2017 at 12:08 AM, Akira Ajisaka 
 > wrote:
 >
 >> Sure.
 >> If you want to edit the wiki, please tell me your ASF confluence
 account.
 >>
 >> -Akira
 >>
 >> On 2017/05/30 15:31, Rohith Sharma K S wrote:
 >>
 >>> Couple of more JIRAs need to be back ported for 2.7.4 release. These
 will
 >>> solve RM HA unstability issues.
 >>> https://issues.apache.org/jira/browse/YARN-5333
 >>>