Re: Flume bechmarks

2016-10-14 Thread Chris Horrocks
I've got a pretty well resourced pre-production environment that might do the 
trick quite nicely.


-- Chris Horrocks


On Thu, Oct 13, 2016 at 4:55 pm, Lior Zeno <'liorz...@gmail.com'> wrote:
I think that we can come up with an initial version with little efforts.
The simplest scenario I can think of is running a Flume instance (with a
SeqGen source and a Null sink) for one minute, and then report the average
events per second.

On Thu, Oct 13, 2016 at 6:43 PM, Attila Simon  wrote:

> Good idea! What would be required to set up something similar for Flume?
> ie initial time cost for setting up the infrastructure and periodic time
> cost to add new use-cases.
>
> Cheers,
> Attila
>
>
>
> On Thu, Oct 13, 2016 at 5:19 PM, Lior Zeno  wrote:
>
> > Hi All,
> >
> > Monitoring Flume's performance over time is an important step in every
> > production-level application. Benchmarking Flume on a nightly basis has
> > the following advantages:
> >
> > * Better understanding of Flume's bottlenecks.
> > * Allow users to compare the performance of different solutions, such as
> > Logstash and Fluentd.
> > * Better understanding of the influence of recent commits on performance.
> >
> > Logstash already conducts various performance tests, more details in this
> > link:
> > http://logstash-benchmarks.elastic.co/
> >
> > I propose adding a few micro-benchmarks showing Flume's TPS vs date (of
> > course, in the ideal case where the input and/or output do not bottleneck
> > the system), e.g. using the SeqGen source.
> >
> > Thoughts?
> >
> > Thanks
> >
>

Re: Flume bechmarks

2016-10-13 Thread Roshan Naik
You may want to take a look at
- 
https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements+
-+round+2


and the older..
- 
https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Meas
urements


when coming up with a list of configurations to benchmark.

-roshan


On 10/13/16, 9:12 AM, "Balazs Donat Bessenyei"  wrote:

>I have just proposed enabling Travis on a different thread. That should
>help with this. (Having a separate machine would be best, but I don't know
>how we could get one. I'll do the homework for this.)
>
>On Oct 13, 2016 5:57 PM, "Lior Zeno"  wrote:
>
>> Maybe getting an isolated environment? The CI environment might be
>>shared
>> among multiple users, adding too much noise to the performance test.
>>
>> On Thu, Oct 13, 2016 at 6:53 PM, Balazs Donat Bessenyei <
>> bes...@cloudera.com
>> > wrote:
>>
>> > +1
>> >
>> > I think this is a good idea!
>> >
>> > How can I help with setting it up?
>> >
>> > On Oct 13, 2016 5:20 PM, "Lior Zeno"  wrote:
>> >
>> > > Hi All,
>> > >
>> > > Monitoring Flume's performance over time is an important step in
>>every
>> > > production-level application.  Benchmarking Flume on a nightly basis
>> has
>> > > the following advantages:
>> > >
>> > > * Better understanding of Flume's bottlenecks.
>> > > * Allow users to compare the performance of different solutions,
>>such
>> as
>> > > Logstash and Fluentd.
>> > > * Better understanding of the influence of recent commits on
>> performance.
>> > >
>> > > Logstash already conducts various performance tests, more details in
>> this
>> > > link:
>> > > http://logstash-benchmarks.elastic.co/
>> > >
>> > > I propose adding a few micro-benchmarks showing Flume's TPS vs date
>>(of
>> > > course, in the ideal case where the input and/or output do not
>> bottleneck
>> > > the system), e.g. using the SeqGen source.
>> > >
>> > > Thoughts?
>> > >
>> > > Thanks
>> > >
>> >
>>



Re: Flume bechmarks

2016-10-13 Thread Balazs Donat Bessenyei
I have just proposed enabling Travis on a different thread. That should
help with this. (Having a separate machine would be best, but I don't know
how we could get one. I'll do the homework for this.)

On Oct 13, 2016 5:57 PM, "Lior Zeno"  wrote:

> Maybe getting an isolated environment? The CI environment might be shared
> among multiple users, adding too much noise to the performance test.
>
> On Thu, Oct 13, 2016 at 6:53 PM, Balazs Donat Bessenyei <
> bes...@cloudera.com
> > wrote:
>
> > +1
> >
> > I think this is a good idea!
> >
> > How can I help with setting it up?
> >
> > On Oct 13, 2016 5:20 PM, "Lior Zeno"  wrote:
> >
> > > Hi All,
> > >
> > > Monitoring Flume's performance over time is an important step in every
> > > production-level application.  Benchmarking Flume on a nightly basis
> has
> > > the following advantages:
> > >
> > > * Better understanding of Flume's bottlenecks.
> > > * Allow users to compare the performance of different solutions, such
> as
> > > Logstash and Fluentd.
> > > * Better understanding of the influence of recent commits on
> performance.
> > >
> > > Logstash already conducts various performance tests, more details in
> this
> > > link:
> > > http://logstash-benchmarks.elastic.co/
> > >
> > > I propose adding a few micro-benchmarks showing Flume's TPS vs date (of
> > > course, in the ideal case where the input and/or output do not
> bottleneck
> > > the system), e.g. using the SeqGen source.
> > >
> > > Thoughts?
> > >
> > > Thanks
> > >
> >
>


Re: Flume bechmarks

2016-10-13 Thread Lior Zeno
Maybe getting an isolated environment? The CI environment might be shared
among multiple users, adding too much noise to the performance test.

On Thu, Oct 13, 2016 at 6:53 PM, Balazs Donat Bessenyei  wrote:

> +1
>
> I think this is a good idea!
>
> How can I help with setting it up?
>
> On Oct 13, 2016 5:20 PM, "Lior Zeno"  wrote:
>
> > Hi All,
> >
> > Monitoring Flume's performance over time is an important step in every
> > production-level application.  Benchmarking Flume on a nightly basis has
> > the following advantages:
> >
> > * Better understanding of Flume's bottlenecks.
> > * Allow users to compare the performance of different solutions, such as
> > Logstash and Fluentd.
> > * Better understanding of the influence of recent commits on performance.
> >
> > Logstash already conducts various performance tests, more details in this
> > link:
> > http://logstash-benchmarks.elastic.co/
> >
> > I propose adding a few micro-benchmarks showing Flume's TPS vs date (of
> > course, in the ideal case where the input and/or output do not bottleneck
> > the system), e.g. using the SeqGen source.
> >
> > Thoughts?
> >
> > Thanks
> >
>


Re: Flume bechmarks

2016-10-13 Thread Lior Zeno
I think that we can come up with an initial version with little efforts.
The simplest scenario I can think of is running a Flume instance (with a
SeqGen source and a Null sink) for one minute, and then report the average
events per second.

On Thu, Oct 13, 2016 at 6:43 PM, Attila Simon  wrote:

> Good idea! What would be required to set up something similar for Flume?
> ie initial time cost for setting up the infrastructure and periodic time
> cost to add new use-cases.
>
> Cheers,
> Attila
>
>
>
> On Thu, Oct 13, 2016 at 5:19 PM, Lior Zeno  wrote:
>
> > Hi All,
> >
> > Monitoring Flume's performance over time is an important step in every
> > production-level application.  Benchmarking Flume on a nightly basis has
> > the following advantages:
> >
> > * Better understanding of Flume's bottlenecks.
> > * Allow users to compare the performance of different solutions, such as
> > Logstash and Fluentd.
> > * Better understanding of the influence of recent commits on performance.
> >
> > Logstash already conducts various performance tests, more details in this
> > link:
> > http://logstash-benchmarks.elastic.co/
> >
> > I propose adding a few micro-benchmarks showing Flume's TPS vs date (of
> > course, in the ideal case where the input and/or output do not bottleneck
> > the system), e.g. using the SeqGen source.
> >
> > Thoughts?
> >
> > Thanks
> >
>


Re: Flume bechmarks

2016-10-13 Thread Balazs Donat Bessenyei
+1

I think this is a good idea!

How can I help with setting it up?

On Oct 13, 2016 5:20 PM, "Lior Zeno"  wrote:

> Hi All,
>
> Monitoring Flume's performance over time is an important step in every
> production-level application.  Benchmarking Flume on a nightly basis has
> the following advantages:
>
> * Better understanding of Flume's bottlenecks.
> * Allow users to compare the performance of different solutions, such as
> Logstash and Fluentd.
> * Better understanding of the influence of recent commits on performance.
>
> Logstash already conducts various performance tests, more details in this
> link:
> http://logstash-benchmarks.elastic.co/
>
> I propose adding a few micro-benchmarks showing Flume's TPS vs date (of
> course, in the ideal case where the input and/or output do not bottleneck
> the system), e.g. using the SeqGen source.
>
> Thoughts?
>
> Thanks
>


Re: Flume bechmarks

2016-10-13 Thread Attila Simon
Good idea! What would be required to set up something similar for Flume?
ie initial time cost for setting up the infrastructure and periodic time
cost to add new use-cases.

Cheers,
Attila



On Thu, Oct 13, 2016 at 5:19 PM, Lior Zeno  wrote:

> Hi All,
>
> Monitoring Flume's performance over time is an important step in every
> production-level application.  Benchmarking Flume on a nightly basis has
> the following advantages:
>
> * Better understanding of Flume's bottlenecks.
> * Allow users to compare the performance of different solutions, such as
> Logstash and Fluentd.
> * Better understanding of the influence of recent commits on performance.
>
> Logstash already conducts various performance tests, more details in this
> link:
> http://logstash-benchmarks.elastic.co/
>
> I propose adding a few micro-benchmarks showing Flume's TPS vs date (of
> course, in the ideal case where the input and/or output do not bottleneck
> the system), e.g. using the SeqGen source.
>
> Thoughts?
>
> Thanks
>


Flume bechmarks

2016-10-13 Thread Lior Zeno
Hi All,

Monitoring Flume's performance over time is an important step in every
production-level application.  Benchmarking Flume on a nightly basis has
the following advantages:

* Better understanding of Flume's bottlenecks.
* Allow users to compare the performance of different solutions, such as
Logstash and Fluentd.
* Better understanding of the influence of recent commits on performance.

Logstash already conducts various performance tests, more details in this
link:
http://logstash-benchmarks.elastic.co/

I propose adding a few micro-benchmarks showing Flume's TPS vs date (of
course, in the ideal case where the input and/or output do not bottleneck
the system), e.g. using the SeqGen source.

Thoughts?

Thanks