Re: Burrow V3 - going down with memory issues

2018-03-02 Thread Todd Palino
Hey, Srinivasa. It sounds like you’re running an intermediate version of
the master branch (I remember that specific error as I was making some
changes). It should be resolved with the latest version of master. Can you
try pulling the latest master?

We’ll be cutting a new release version soon, as there have been a number of
changes and updates that are useful. In the meantime, you may find more
help if you post to our Gitter:
https://gitter.im/linkedin-Burrow/Lobby

-Todd


On Thu, Mar 1, 2018 at 11:58 PM, Srinivasa Balaji <
srinivasa_bal...@trimble.com> wrote:

> We are running burrow version3 for kafka consumer lag monitoring,
>
> Issue is burrow service is getting down frequently which is every 1hr,
>
> Command used to start the burrow service
>
> # nohup $GOPATH/bin/Burrow --config-dir /opt/work/src/github.com/
> linkedin/Burrow/config 1>&2 &
>
> We are seeing the below mentioned error in log
>
> 
>
> panic: interface conversion: interface {} is nil, not *storage.brokerOffset
>
> goroutine 85 [running]:
> github.com/linkedin/Burrow/core/internal/storage.(*
> InMemoryStorage).fetchTopic(0xc4201d65a0, 0xc42100b8f0, 0xc42034d800)
> /opt/work/src/github.com/linkedin/Burrow/core/internal/
> storage/inmemory.go:611 +0x3c8
> github.com/linkedin/Burrow/core/internal/storage.(*
> InMemoryStorage).(github.com/linkedin/Burrow/core/internal/
> storage.fetchTopic)-fm(0xc42100b8f0, 0xc42034d800)
> /opt/work/src/github.com/linkedin/Burrow/core/internal/
> storage/inmemory.go:182 +0x3e
> github.com/linkedin/Burrow/core/internal/storage.(*InMemoryStorage).
> requestWorker(0xc4201d65a0, 0x11, 0xc4201d4900)
> /opt/work/src/github.com/linkedin/Burrow/core/internal/
> storage/inmemory.go:190 +0x105b
> created by github.com/linkedin/Burrow/core/internal/storage.(*
> InMemoryStorage).Start
> /opt/work/src/github.com/linkedin/Burrow/core/internal/
> storage/inmemory.go:144 +0x28b
>
>
> 
>
>
> Kindly let us know your thoughts on this to fix it.
>
> ———-
> 
> Srinivasa Balaji L
> Principal Architect Cloud & DevOPS - TPaaS
> 10368, Westmoor Drive, Westminster, CO 80021
> *M*: +1(303) 324-9822 <+919790804422>
> *Email*: lsbal...@trimble.com
>



-- 
*Todd Palino*
Senior Staff Engineer, Site Reliability
Data Infrastructure Streaming



linkedin.com/in/toddpalino


RE: Consultant Help

2018-03-02 Thread Matt Stone
Thank you I will look into that. 

-Original Message-
From: Svante Karlsson [mailto:svante.karls...@csi.se] 
Sent: Friday, March 2, 2018 1:50 PM
To: users@kafka.apache.org
Subject: Re: Consultant Help

try https://www.confluent.io/ - that's what they do

/svante

2018-03-02 21:21 GMT+01:00 Matt Stone :

> We are looking for a consultant or contractor that can come onsite to 
> our Ogden, Utah location in the US, to help with a Kafka set up and 
> maintenance project.  What we need is someone with the knowledge and 
> experience to build out the Kafka environment from scratch.
>
> We are thinking they would need to be onsite for 6-12 months  to set 
> it up, and mentor some of our team so they can get up to speed to do 
> the maintenance once the contractor is gone.  If anyone has the 
> experience setting up Kafka from scratch in a Linux environment, 
> maintain node clusters, and help train others on the team how to do 
> it, and you are interested in a long term project working at the 
> client site, I would love to start up  a discussion, to see if we could use 
> you for the role.
>
> I would also be interested in hearing about any consulting firms that 
> might have resources that could help with this role.
>
> Matt Stone
>
>
> -Original Message-
> From: Matt Daum [mailto:m...@setfive.com]
> Sent: Friday, March 2, 2018 1:11 PM
> To: users@kafka.apache.org
> Subject: Re: Kafka Setup for Daily counts on wide array of keys
>
> Actually it looks like the better way would be to output the counts to 
> a new topic then ingest that topic into the DB itself.  Is that the 
> correct way?
>
> On Fri, Mar 2, 2018 at 9:24 AM, Matt Daum  wrote:
>
> > I am new to Kafka but I think I have a good use case for it.  I am 
> > trying to build daily counts of requests based on a number of 
> > different attributes in a high throughput system (~1 million 
> > requests/sec. across all  8 servers).  The different attributes are 
> > unbounded in terms of values, and some will spread across 100's of 
> > millions values.  This is my current through process, let me know 
> > where I could be more efficient or if there is a better way to do it.
> >
> > I'll create an AVRO object "Impression" which has all the attributes 
> > of the inbound request.  My application servers then will on each 
> > request create and send this to a single kafka topic.
> >
> > I'll then have a consumer which creates a stream from the topic.  
> > From there I'll use the windowed timeframes and groupBy to group by 
> > the attributes on each given day.  At the end of the day I'd need to 
> > read out the data store to an external system for storage.  Since I 
> > won't know all the values I'd need something similar to the 
> > KVStore.all() but for WindowedKV Stores.  This appears that it'd be 
> > possible in 1.1 with this
> > commit: https://github.com/apache/kafka/commit/
> > 1d1c8575961bf6bce7decb049be7f10ca76bd0c5 .
> >
> > Is this the best approach to doing this?  Or would I be better using 
> > the stream to listen and then an external DB like Aerospike to store 
> > the counts and read out of it directly end of day.
> >
> > Thanks for the help!
> > Daum
> >
>


Resetting connect-offsets postion Using Kafka Connect and Debezium

2018-03-02 Thread Joe Hammerman
HiApache Kafka users email distribution list.

I'm trying to post to connect-offsets topic a message with a lsn from the
past. I dump the connect offsets topic with the following command:

./kafka-console-consumer.sh --bootstrap-server 
--consumer.config ../config/consumer.properties --property print.key=true
--new-consumer --topic connect-offsets


(config-consumer.properties has a group.id and SSL configuration
information).

Then I'll extract a message, and keeping all the key & value fields the
same with the exception of using an old LSN, I turn off Kafka Connect and
produce to the connect-offsets topic with a connection formed with the
following command:

[root@billing-kafka001 bin]# ./kafka-console-producer.sh --broker-list--topic connect-offsets --property "key.separator=," --property
"parse.key=true" --property "compression.codec=1" --producer.config
../config/producer.properties

(producer.properties has only ssl configuration settings).

The messages look like this:

["postgres-events-connector",{"server":"staging-billing"}]{"last_snapshot_record":true,"lsn":1275604312,"txId":2125,"ts_usec":1520019207386333000,"snapshot":true}

When Kafka Connect is restarted, however, the offset source_info reported
is that of the last commit issued by the producer, not my reset point.

Am I obtaining the key in the wrong fashion? Should txId be reset to the
value of commit I obtained the LSN from (I tested this and my results were
no better)? Am I using the correct compression codec for the topic?

Any assistance would be greatly appreciated!!

Thanks in advance for any assistance anyone can provide,
Joseph Hammerman


Re: Consultant Help

2018-03-02 Thread Svante Karlsson
try https://www.confluent.io/ - that's what they do

/svante

2018-03-02 21:21 GMT+01:00 Matt Stone :

> We are looking for a consultant or contractor that can come onsite to our
> Ogden, Utah location in the US, to help with a Kafka set up and maintenance
> project.  What we need is someone with the knowledge and experience to
> build out the Kafka environment from scratch.
>
> We are thinking they would need to be onsite for 6-12 months  to set it
> up, and mentor some of our team so they can get up to speed to do the
> maintenance once the contractor is gone.  If anyone has the experience
> setting up Kafka from scratch in a Linux environment, maintain node
> clusters, and help train others on the team how to do it, and you are
> interested in a long term project working at the client site, I would love
> to start up  a discussion, to see if we could use you for the role.
>
> I would also be interested in hearing about any consulting firms that
> might have resources that could help with this role.
>
> Matt Stone
>
>
> -Original Message-
> From: Matt Daum [mailto:m...@setfive.com]
> Sent: Friday, March 2, 2018 1:11 PM
> To: users@kafka.apache.org
> Subject: Re: Kafka Setup for Daily counts on wide array of keys
>
> Actually it looks like the better way would be to output the counts to a
> new topic then ingest that topic into the DB itself.  Is that the correct
> way?
>
> On Fri, Mar 2, 2018 at 9:24 AM, Matt Daum  wrote:
>
> > I am new to Kafka but I think I have a good use case for it.  I am
> > trying to build daily counts of requests based on a number of
> > different attributes in a high throughput system (~1 million
> > requests/sec. across all  8 servers).  The different attributes are
> > unbounded in terms of values, and some will spread across 100's of
> > millions values.  This is my current through process, let me know
> > where I could be more efficient or if there is a better way to do it.
> >
> > I'll create an AVRO object "Impression" which has all the attributes
> > of the inbound request.  My application servers then will on each
> > request create and send this to a single kafka topic.
> >
> > I'll then have a consumer which creates a stream from the topic.  From
> > there I'll use the windowed timeframes and groupBy to group by the
> > attributes on each given day.  At the end of the day I'd need to read
> > out the data store to an external system for storage.  Since I won't
> > know all the values I'd need something similar to the KVStore.all()
> > but for WindowedKV Stores.  This appears that it'd be possible in 1.1
> > with this
> > commit: https://github.com/apache/kafka/commit/
> > 1d1c8575961bf6bce7decb049be7f10ca76bd0c5 .
> >
> > Is this the best approach to doing this?  Or would I be better using
> > the stream to listen and then an external DB like Aerospike to store
> > the counts and read out of it directly end of day.
> >
> > Thanks for the help!
> > Daum
> >
>


Consultant Help

2018-03-02 Thread Matt Stone
We are looking for a consultant or contractor that can come onsite to our 
Ogden, Utah location in the US, to help with a Kafka set up and maintenance 
project.  What we need is someone with the knowledge and experience to build 
out the Kafka environment from scratch.  

We are thinking they would need to be onsite for 6-12 months  to set it up, and 
mentor some of our team so they can get up to speed to do the maintenance once 
the contractor is gone.  If anyone has the experience setting up Kafka from 
scratch in a Linux environment, maintain node clusters, and help train others 
on the team how to do it, and you are interested in a long term project working 
at the client site, I would love to start up  a discussion, to see if we could 
use you for the role. 

I would also be interested in hearing about any consulting firms that might 
have resources that could help with this role. 

Matt Stone


-Original Message-
From: Matt Daum [mailto:m...@setfive.com] 
Sent: Friday, March 2, 2018 1:11 PM
To: users@kafka.apache.org
Subject: Re: Kafka Setup for Daily counts on wide array of keys

Actually it looks like the better way would be to output the counts to a new 
topic then ingest that topic into the DB itself.  Is that the correct way?

On Fri, Mar 2, 2018 at 9:24 AM, Matt Daum  wrote:

> I am new to Kafka but I think I have a good use case for it.  I am 
> trying to build daily counts of requests based on a number of 
> different attributes in a high throughput system (~1 million 
> requests/sec. across all  8 servers).  The different attributes are 
> unbounded in terms of values, and some will spread across 100's of 
> millions values.  This is my current through process, let me know 
> where I could be more efficient or if there is a better way to do it.
>
> I'll create an AVRO object "Impression" which has all the attributes 
> of the inbound request.  My application servers then will on each 
> request create and send this to a single kafka topic.
>
> I'll then have a consumer which creates a stream from the topic.  From 
> there I'll use the windowed timeframes and groupBy to group by the 
> attributes on each given day.  At the end of the day I'd need to read 
> out the data store to an external system for storage.  Since I won't 
> know all the values I'd need something similar to the KVStore.all() 
> but for WindowedKV Stores.  This appears that it'd be possible in 1.1 
> with this
> commit: https://github.com/apache/kafka/commit/
> 1d1c8575961bf6bce7decb049be7f10ca76bd0c5 .
>
> Is this the best approach to doing this?  Or would I be better using 
> the stream to listen and then an external DB like Aerospike to store 
> the counts and read out of it directly end of day.
>
> Thanks for the help!
> Daum
>


Re: Kafka Setup for Daily counts on wide array of keys

2018-03-02 Thread Matt Daum
Actually it looks like the better way would be to output the counts to a
new topic then ingest that topic into the DB itself.  Is that the correct
way?

On Fri, Mar 2, 2018 at 9:24 AM, Matt Daum  wrote:

> I am new to Kafka but I think I have a good use case for it.  I am trying
> to build daily counts of requests based on a number of different attributes
> in a high throughput system (~1 million requests/sec. across all  8
> servers).  The different attributes are unbounded in terms of values, and
> some will spread across 100's of millions values.  This is my current
> through process, let me know where I could be more efficient or if there is
> a better way to do it.
>
> I'll create an AVRO object "Impression" which has all the attributes of
> the inbound request.  My application servers then will on each request
> create and send this to a single kafka topic.
>
> I'll then have a consumer which creates a stream from the topic.  From
> there I'll use the windowed timeframes and groupBy to group by the
> attributes on each given day.  At the end of the day I'd need to read out
> the data store to an external system for storage.  Since I won't know all
> the values I'd need something similar to the KVStore.all() but for
> WindowedKV Stores.  This appears that it'd be possible in 1.1 with this
> commit: https://github.com/apache/kafka/commit/
> 1d1c8575961bf6bce7decb049be7f10ca76bd0c5 .
>
> Is this the best approach to doing this?  Or would I be better using the
> stream to listen and then an external DB like Aerospike to store the counts
> and read out of it directly end of day.
>
> Thanks for the help!
> Daum
>


回复:答复: which Kafka StateStore could I use ?

2018-03-02 Thread 杰 杨
can you show some tips for this?

---原始邮件---
发件人: "Guozhang Wang "
发送时间: 2018年3月3日 01:32:55
收件人: "users";
主题: Re: 答复: which Kafka StateStore could I use ?


Hello Jie,

By default Kafka Streams uses caching on top of its internal state stores
to de-dup output streams to the final destination (in your case the DB) so
that for a single key, fewer updates will be generated giving a small
working set. If your aggregation logic follows such key distribution, you
can try enlarge the cache size (by default it is only 50MB) and see if it
helps reduce the downstream traffic to your DB.


Guozhang


On Thu, Mar 1, 2018 at 6:33 PM, 杰 杨  wrote:

> Yes .but the DB’s Concurrent quantity is  the limitation.
> Now I can process 600 records/second
> And I want enhance it
>
> 发送自 Windows 10 版邮件应用
>
> 发件人: Guozhang Wang
> 发送时间: 2018年3月2日 2:59
> 收件人: users@kafka.apache.org
> 主题: Re: which Kafka StateStore could I use ?
>
> Hello Jie,
>
> Just to understand your problem better, are you referring "db" for an
> external storage engine outside Kafka Streams, and you are asking how to
> only send one record per aggregation key (assuming you are doing some
> aggregations with Streams' statestore) to that end storage engine?
>
>
> Guozhang
>
>
> On Wed, Feb 28, 2018 at 7:53 PM, 杰 杨  wrote:
>
> >
> > HI:
> > I use kafka streams for real-time data analysis
> > and I meet a problem.
> > now I process a record in kafka and compute it and send to db.
> > but db concurrency level is not suit for me.
> > so I want that
> > 1)when there is not data in kakfa ,the statestore is  no results.
> > 2) when there is a lot of data records in kafka the statestore save
> > computed result and I need send its once to db.
> > which StateStoe can I use for do that above
> > 
> > funk...@live.com
> >
>
>
>
> --
> -- Guozhang
>
>


--
-- Guozhang



回复:答复: which Kafka StateStore could I use ?

2018-03-02 Thread 杰 杨
mongodb i used,but i need update 10 operator for one record
I process a record for 20 ms for one thread

---原始邮件---
发件人: "Ted Yu "
发送时间: 2018年3月3日 01:37:13
收件人: "users";
主题: Re: 答复: which Kafka StateStore could I use ?


Jie:
Which DB are you using ?

600 records/second is very low rate.

Probably your DB needs some tuning.

Cheers

On Fri, Mar 2, 2018 at 9:32 AM, Guozhang Wang  wrote:

> Hello Jie,
>
> By default Kafka Streams uses caching on top of its internal state stores
> to de-dup output streams to the final destination (in your case the DB) so
> that for a single key, fewer updates will be generated giving a small
> working set. If your aggregation logic follows such key distribution, you
> can try enlarge the cache size (by default it is only 50MB) and see if it
> helps reduce the downstream traffic to your DB.
>
>
> Guozhang
>
>
> On Thu, Mar 1, 2018 at 6:33 PM, 杰 杨  wrote:
>
> > Yes .but the DB’s Concurrent quantity is  the limitation.
> > Now I can process 600 records/second
> > And I want enhance it
> >
> > 发送自 Windows 10 版邮件应用
> >
> > 发件人: Guozhang Wang
> > 发送时间: 2018年3月2日 2:59
> > 收件人: users@kafka.apache.org
> > 主题: Re: which Kafka StateStore could I use ?
> >
> > Hello Jie,
> >
> > Just to understand your problem better, are you referring "db" for an
> > external storage engine outside Kafka Streams, and you are asking how to
> > only send one record per aggregation key (assuming you are doing some
> > aggregations with Streams' statestore) to that end storage engine?
> >
> >
> > Guozhang
> >
> >
> > On Wed, Feb 28, 2018 at 7:53 PM, 杰 杨  wrote:
> >
> > >
> > > HI:
> > > I use kafka streams for real-time data analysis
> > > and I meet a problem.
> > > now I process a record in kafka and compute it and send to db.
> > > but db concurrency level is not suit for me.
> > > so I want that
> > > 1)when there is not data in kakfa ,the statestore is  no results.
> > > 2) when there is a lot of data records in kafka the statestore save
> > > computed result and I need send its once to db.
> > > which StateStoe can I use for do that above
> > > 
> > > funk...@live.com
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
> >
>
>
> --
> -- Guozhang
>



Re: 答复: which Kafka StateStore could I use ?

2018-03-02 Thread Ted Yu
Jie:
Which DB are you using ?

600 records/second is very low rate.

Probably your DB needs some tuning.

Cheers

On Fri, Mar 2, 2018 at 9:32 AM, Guozhang Wang  wrote:

> Hello Jie,
>
> By default Kafka Streams uses caching on top of its internal state stores
> to de-dup output streams to the final destination (in your case the DB) so
> that for a single key, fewer updates will be generated giving a small
> working set. If your aggregation logic follows such key distribution, you
> can try enlarge the cache size (by default it is only 50MB) and see if it
> helps reduce the downstream traffic to your DB.
>
>
> Guozhang
>
>
> On Thu, Mar 1, 2018 at 6:33 PM, 杰 杨  wrote:
>
> > Yes .but the DB’s Concurrent quantity is  the limitation.
> > Now I can process 600 records/second
> > And I want enhance it
> >
> > 发送自 Windows 10 版邮件应用
> >
> > 发件人: Guozhang Wang
> > 发送时间: 2018年3月2日 2:59
> > 收件人: users@kafka.apache.org
> > 主题: Re: which Kafka StateStore could I use ?
> >
> > Hello Jie,
> >
> > Just to understand your problem better, are you referring "db" for an
> > external storage engine outside Kafka Streams, and you are asking how to
> > only send one record per aggregation key (assuming you are doing some
> > aggregations with Streams' statestore) to that end storage engine?
> >
> >
> > Guozhang
> >
> >
> > On Wed, Feb 28, 2018 at 7:53 PM, 杰 杨  wrote:
> >
> > >
> > > HI:
> > > I use kafka streams for real-time data analysis
> > > and I meet a problem.
> > > now I process a record in kafka and compute it and send to db.
> > > but db concurrency level is not suit for me.
> > > so I want that
> > > 1)when there is not data in kakfa ,the statestore is  no results.
> > > 2) when there is a lot of data records in kafka the statestore save
> > > computed result and I need send its once to db.
> > > which StateStoe can I use for do that above
> > > 
> > > funk...@live.com
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
> >
>
>
> --
> -- Guozhang
>


Re: 答复: which Kafka StateStore could I use ?

2018-03-02 Thread Guozhang Wang
Hello Jie,

By default Kafka Streams uses caching on top of its internal state stores
to de-dup output streams to the final destination (in your case the DB) so
that for a single key, fewer updates will be generated giving a small
working set. If your aggregation logic follows such key distribution, you
can try enlarge the cache size (by default it is only 50MB) and see if it
helps reduce the downstream traffic to your DB.


Guozhang


On Thu, Mar 1, 2018 at 6:33 PM, 杰 杨  wrote:

> Yes .but the DB’s Concurrent quantity is  the limitation.
> Now I can process 600 records/second
> And I want enhance it
>
> 发送自 Windows 10 版邮件应用
>
> 发件人: Guozhang Wang
> 发送时间: 2018年3月2日 2:59
> 收件人: users@kafka.apache.org
> 主题: Re: which Kafka StateStore could I use ?
>
> Hello Jie,
>
> Just to understand your problem better, are you referring "db" for an
> external storage engine outside Kafka Streams, and you are asking how to
> only send one record per aggregation key (assuming you are doing some
> aggregations with Streams' statestore) to that end storage engine?
>
>
> Guozhang
>
>
> On Wed, Feb 28, 2018 at 7:53 PM, 杰 杨  wrote:
>
> >
> > HI:
> > I use kafka streams for real-time data analysis
> > and I meet a problem.
> > now I process a record in kafka and compute it and send to db.
> > but db concurrency level is not suit for me.
> > so I want that
> > 1)when there is not data in kakfa ,the statestore is  no results.
> > 2) when there is a lot of data records in kafka the statestore save
> > computed result and I need send its once to db.
> > which StateStoe can I use for do that above
> > 
> > funk...@live.com
> >
>
>
>
> --
> -- Guozhang
>
>


-- 
-- Guozhang


Kafka Setup for Daily counts on wide array of keys

2018-03-02 Thread Matt Daum
I am new to Kafka but I think I have a good use case for it.  I am trying
to build daily counts of requests based on a number of different attributes
in a high throughput system (~1 million requests/sec. across all  8
servers).  The different attributes are unbounded in terms of values, and
some will spread across 100's of millions values.  This is my current
through process, let me know where I could be more efficient or if there is
a better way to do it.

I'll create an AVRO object "Impression" which has all the attributes of the
inbound request.  My application servers then will on each request create
and send this to a single kafka topic.

I'll then have a consumer which creates a stream from the topic.  From
there I'll use the windowed timeframes and groupBy to group by the
attributes on each given day.  At the end of the day I'd need to read out
the data store to an external system for storage.  Since I won't know all
the values I'd need something similar to the KVStore.all() but for
WindowedKV Stores.  This appears that it'd be possible in 1.1 with this
commit:
https://github.com/apache/kafka/commit/1d1c8575961bf6bce7decb049be7f10ca76bd0c5
.

Is this the best approach to doing this?  Or would I be better using the
stream to listen and then an external DB like Aerospike to store the counts
and read out of it directly end of day.

Thanks for the help!
Daum


Re: [kafka-clients] Re: [VOTE] 1.1.0 RC0

2018-03-02 Thread Damian Guy
Thanks Jun

On Fri, 2 Mar 2018 at 02:25 Jun Rao  wrote:

> KAFKA-6111 is now merged to 1.1 branch.
>
> Thanks,
>
> Jun
>
> On Thu, Mar 1, 2018 at 2:50 PM, Jun Rao  wrote:
>
>> Hi, Damian,
>>
>> It would also be useful to include KAFKA-6111, which prevents 
>> deleteLogDirEventNotifications
>> path to be deleted correctly from Zookeeper. The patch should be committed
>> later today.
>>
>> Thanks,
>>
>> Jun
>>
>> On Thu, Mar 1, 2018 at 1:47 PM, Damian Guy  wrote:
>>
>>> Thanks Jason. Assuming the system tests pass i'll cut RC1 tomorrow.
>>>
>>> Thanks,
>>> Damian
>>>
>>> On Thu, 1 Mar 2018 at 19:10 Jason Gustafson  wrote:
>>>
 The fix has been merged to 1.1.

 Thanks,
 Jason

 On Wed, Feb 28, 2018 at 11:35 AM, Damian Guy 
 wrote:

 > Hi Jason,
 >
 > Ok - thanks. Let me know how you get on.
 >
 > Cheers,
 > Damian
 >
 > On Wed, 28 Feb 2018 at 19:23 Jason Gustafson 
 wrote:
 >
 > > Hey Damian,
 > >
 > > I think we should consider
 > > https://issues.apache.org/jira/browse/KAFKA-6593
 > > for the release. I have a patch available, but still working on
 > validating
 > > both the bug and the fix.
 > >
 > > -Jason
 > >
 > > On Wed, Feb 28, 2018 at 9:34 AM, Matthias J. Sax <
 matth...@confluent.io>
 > > wrote:
 > >
 > > > No. Both will be released.
 > > >
 > > > -Matthias
 > > >
 > > > On 2/28/18 6:32 AM, Marina Popova wrote:
 > > > > Sorry, maybe a stupid question, but:
 > > > >  I see that Kafka 1.0.1 RC2 is still not released, but now
 1.1.0 RC0
 > is
 > > > coming up...
 > > > > Does it mean 1.0.1 will be abandoned and we should be looking
 forward
 > > to
 > > > 1.1.0 instead?
 > > > >
 > > > > thanks!
 > > > >
 > > > > ​Sent with ProtonMail Secure Email.​
 > > > >
 > > > > ‐‐‐ Original Message ‐‐‐
 > > > >
 > > > > On February 26, 2018 6:28 PM, Vahid S Hashemian <
 > > > vahidhashem...@us.ibm.com> wrote:
 > > > >
 > > > >> +1 (non-binding)
 > > > >>
 > > > >> Built the source and ran quickstart (including streams)
 successfully
 > > on
 > > > >>
 > > > >> Ubuntu (with both Java 8 and Java 9).
 > > > >>
 > > > >> I understand the Windows platform is not officially supported,
 but I
 > > ran
 > > > >>
 > > > >> the same on Windows 10, and except for Step 7 (Connect)
 everything
 > > else
 > > > >>
 > > > >> worked fine.
 > > > >>
 > > > >> There are a number of warning and errors (including
 > > > >>
 > > > >> java.lang.ClassNotFoundException). Here's the final error
 message:
 > > > >>
 > > > >>> bin\\windows\\connect-standalone.bat
 config\\connect-standalone.
 > > > properties
 > > > >>
 > > > >> config\\connect-file-source.properties
 config\\connect-file-sink.
 > > > properties
 > > > >>
 > > > >> ...
 > > > >>
 > > > >> \[2018-02-26 14:55:56,529\] ERROR Stopping after connector
 error
 > > > >>
 > > > >> (org.apache.kafka.connect.cli.ConnectStandalone)
 > > > >>
 > > > >> java.lang.NoClassDefFoundError:
 > > > >>
 > > > >> org/apache/kafka/connect/transforms/util/RegexValidator
 > > > >>
 > > > >> at
 > > > >>
 > > > >> org.apache.kafka.connect.runtime.SinkConnectorConfig.<
 > > > clinit>(SinkConnectorConfig.java:46)
 > > > >>
 > > > >> at
 > > > >>
 > > > >>
 > > > >> org.apache.kafka.connect.runtime.AbstractHerder.
 > > > validateConnectorConfig(AbstractHerder.java:263)
 > > > >>
 > > > >> at
 > > > >>
 > > > >> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.
 > > > putConnectorConfig(StandaloneHerder.java:164)
 > > > >>
 > > > >> at
 > > > >>
 > > > >> org.apache.kafka.connect.cli.ConnectStandalone.main(
 > > > ConnectStandalone.java:107)
 > > > >>
 > > > >> Caused by: java.lang.ClassNotFoundException:
 > > > >>
 > > > >> org.apache.kafka.connect.transforms.util.RegexValidator
 > > > >>
 > > > >> at
 > > > >>
 > > > >> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(
 > > > BuiltinClassLoader.java:582)
 > > > >>
 > > > >> at
 > > > >>
 > > > >> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.
 > > > loadClass(ClassLoaders.java:185)
 > > > >>
 > > > >> at
 java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)
 > > > >>
 > > > >> ... 4 more
 > > > >>
 > > > >> Thanks for running the release.
 > > > >>
 > > > >> --Vahid
 > > > >>
 > > > >> From: Damian Guy damian@gmail.com
 > > > >>
 > > > >> To: d...@kafka.apache.org, users@kafka.apache.org,
 > > > >>
 > > > >> kafka-clie...@googlegroups.com
 > > > >>
 > > > >> Date: 02/24/2018 08:16 AM
 > > > >>

Choosing topic/partition formula

2018-03-02 Thread adrien ruffie
Hi all,


I have a difficulty to represent an example of the calculation of the following 
formula.


Based on throughput requirements one can pick a rough number of partitions.

  1.  Lets call the throughput from producer to a single partition is P
  2.  Throughput from a single partition to a consumer is C
  3.  Target throughput is T
  4.  Required partitions = Max (T/P, T/C)


In the part Choosing topic/partition of this link:

https://community.hortonworks.com/articles/80813/kafka-best-practices-1.html


Could someone explain to me the calculation ? By showing me step by step the 
calculation, plz ? I do not remember how the formula applies visually ...

Maths, are so far :-)


Example T = 20MB/S  P = 5MB/S and  C = 3MB/S


==> Max(20/5, 20/3)  = ???   ==> 20/5 because is the maximum of both ? 
consequently I need 4 partitions for my topic ? But it does not also depend of 
the number of producers & consumers ?




Thank all.


Best regards,


Adrien