Can't use different KDCs/realms for Kafka Connect and HDFS Sink Connector?

2019-03-19 Thread Ashika Umanga Umagiliya
Greetings,

I have setup my Kafka Connect with kerberized Kafka Cluster .(Say the KDC
is  "kafka-auth101.hadoop.local" and the realm "KAFKA.MYCOMPANY.COM")

Now I am trying to setup HDFS Sink with Kerberized Hadoop Cluster with a
different KDC (Say the KDC is "hadoop-auth101.hadoop.local" and the realm "
HADOOP.MYCOMPANY.COM"

I have added both of these realms to the krb5.conf used by Kafka Connect.

But during initialisation , HDFS Sink connector instance fails giving error
:

Any tips on this ?

>>> KdcAccessibility: reset
>>> KeyTabInputStream, readName(): HADOOP.MYCOMPANY.COM
>>> KeyTabInputStream, readName(): hdfsuser
>>> KeyTab: load() entry length: 85; type: 18
Looking for keys for: hdfsu...@hadoop.mycompany.com
Found unsupported keytype (18) for hdfsu...@hadoop.mycompany.com
[2019-03-19 07:21:12,330] INFO Couldn't start HdfsSinkConnector:
(io.confluent.connect.hdfs.HdfsSinkTask)
org.apache.kafka.connect.errors.ConnectException: java.io.IOException:
Login failure for hdfsu...@hadoop.mycompany.com from keytab
/etc/hadoop/keytab/stg.keytab: javax.security.auth.login.LoginException:
Unable to obtain password from user

at io.confluent.connect.hdfs.DataWriter.(DataWriter.java:202)
at io.confluent.connect.hdfs.HdfsSinkTask.start(HdfsSinkTask.java:76)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:232)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:145)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Login failure for
hdfsu...@hadoop.mycompany.com from keytab /etc/hadoop/keytab/stg.keytab:
javax.security.auth.login.LoginException: Unable to obtain password from
user

at
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:963)
at io.confluent.connect.hdfs.DataWriter.(DataWriter.java:127)
... 10 more
Caused by: javax.security.auth.login.LoginException: Unable to obtain
password from user


Re: [VOTE] 2.2.0 RC2

2019-03-19 Thread Satish Duggana
+1 (non-binding)

- Ran testAll/releaseTarGzAll successfully with no failures.
- Ran through quickstart of core/streams on builds generated from 2.2.0-rc2
tag
- Ran few internal apps targeting to topics on 3 node cluster.

Thanks for running the release Matthias!

On Wed, Mar 20, 2019 at 12:43 AM Manikumar 
wrote:

> +1 (non-binding)
>
> - Verified the artifacts, build from src, ran tests
> - Verified the quickstart, ran producer/consumer performance tests.
>
> Thanks for running release!.
>
> Thanks,
> Manikumar
>
> On Wed, Mar 20, 2019 at 12:19 AM David Arthur 
> wrote:
>
> > +1
> >
> > Validated signatures, and ran through quick-start.
> >
> > Thanks!
> >
> > On Mon, Mar 18, 2019 at 4:00 AM Jakub Scholz  wrote:
> >
> > > +1 (non-binding). I used the staged binaries and run some of my tests
> > > against them. All seems to look good to me.
> > >
> > > On Sat, Mar 9, 2019 at 11:56 PM Matthias J. Sax  >
> > > wrote:
> > >
> > > > Hello Kafka users, developers and client-developers,
> > > >
> > > > This is the third candidate for release of Apache Kafka 2.2.0.
> > > >
> > > >  - Added SSL support for custom principal name
> > > >  - Allow SASL connections to periodically re-authenticate
> > > >  - Command line tool bin/kafka-topics.sh adds AdminClient support
> > > >  - Improved consumer group management
> > > >- default group.id is `null` instead of empty string
> > > >  - API improvement
> > > >- Producer: introduce close(Duration)
> > > >- AdminClient: introduce close(Duration)
> > > >- Kafka Streams: new flatTransform() operator in Streams DSL
> > > >- KafkaStreams (and other classed) now implement AutoClosable to
> > > > support try-with-resource
> > > >- New Serdes and default method implementations
> > > >  - Kafka Streams exposed internal client.id via ThreadMetadata
> > > >  - Metric improvements:  All `-min`, `-avg` and `-max` metrics will
> now
> > > > output `NaN` as default value
> > > > Release notes for the 2.2.0 release:
> > > > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/RELEASE_NOTES.html
> > > >
> > > > *** Please download, test, and vote by Thursday, March 14, 9am PST.
> > > >
> > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > https://kafka.apache.org/KEYS
> > > >
> > > > * Release artifacts to be voted upon (source and binary):
> > > > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/
> > > >
> > > > * Maven artifacts to be voted upon:
> > > >
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > > >
> > > > * Javadoc:
> > > > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/javadoc/
> > > >
> > > > * Tag to be voted upon (off 2.2 branch) is the 2.2.0 tag:
> > > > https://github.com/apache/kafka/releases/tag/2.2.0-rc2
> > > >
> > > > * Documentation:
> > > > https://kafka.apache.org/22/documentation.html
> > > >
> > > > * Protocol:
> > > > https://kafka.apache.org/22/protocol.html
> > > >
> > > > * Jenkins builds for the 2.2 branch:
> > > > Unit/integration tests:
> https://builds.apache.org/job/kafka-2.2-jdk8/
> > > > System tests:
> > > https://jenkins.confluent.io/job/system-test-kafka/job/2.2/
> > > >
> > > > /**
> > > >
> > > > Thanks,
> > > >
> > > > -Matthias
> > > >
> > > >
> > >
> >
>


Re: Using kafka with RESTful API

2019-03-19 Thread Desmond Lim
Thank you!


Re: kafka latency for large message

2019-03-19 Thread Nan Xu
that's very good information from the slides, thanks. Our design to use
kafka has 2 purpose. one is use it as a cache, we use ktable for that
purpose, second purpose is use as message delivery mechanism to send it to
other system. Because we very much care the latency, the ktable with a
compact topic suit us very well, if has to find another system to do the
caching, big change involved. The way described in the slides, which break
the message to smaller chunks then reassemble them seems a viable solution.

do you know why kafka doesn't have a liner latency for big messages
comparing to small ones. for 2M message, I have avg latency less than 10
ms, more expecting for 30M has latency less than 10 * 20 = 200ms

On Mon, Mar 18, 2019 at 3:29 PM Bruce Markey  wrote:

> Hi Nan,
>
> Would you consider other approaches that may actually be a more efficient
> solution for you? There is a slide deck Handle Large Messages In Apache
> Kafka
> <
> https://www.slideshare.net/JiangjieQin/handle-large-messages-in-apache-kafka-58692297
> >.
> For messages this large, one of the approaches suggested is Reference Based
> Messaging where you write your large files to an external data store then
> produce a small Apache Kafka message with a reference for where to find the
> file. This would allow your consumer applications to find the file as
> needed rather than storing all that data in the event log.
>
> --  bjm
>
> On Thu, Mar 14, 2019 at 1:53 PM Xu, Nan  wrote:
>
> > Hi,
> >
> > We are using kafka to send messages and there is less than 1% of
> > message is very big, close to 30M. understanding kafka is not ideal for
> > sending big messages, because the large message rate is very low, we just
> > want let kafka do it anyway. But still want to get a reasonable latency.
> >
> > To test, I just setup up a topic test on a single broker local kafka,
> > with only 1 partition and 1 replica, using the following command
> >
> > ./kafka-producer-perf-test.sh  --topic test --num-records 200
> > --throughput 1 --record-size 3000 --producer.config
> > ../config/producer.properties
> >
> > Producer.config
> >
> > #Max 40M message
> > max.request.size=4000
> > buffer.memory=4000
> >
> > #2M buffer
> > send.buffer.bytes=200
> >
> > 6 records sent, 1.1 records/sec (31.00 MB/sec), 973.0 ms avg latency,
> > 1386.0 max latency.
> > 6 records sent, 1.0 records/sec (28.91 MB/sec), 787.2 ms avg latency,
> > 1313.0 max latency.
> > 5 records sent, 1.0 records/sec (27.92 MB/sec), 582.8 ms avg latency,
> > 643.0 max latency.
> > 6 records sent, 1.1 records/sec (30.16 MB/sec), 685.3 ms avg latency,
> > 1171.0 max latency.
> > 5 records sent, 1.0 records/sec (27.92 MB/sec), 629.4 ms avg latency,
> > 729.0 max latency.
> > 5 records sent, 1.0 records/sec (27.61 MB/sec), 635.6 ms avg latency,
> > 673.0 max latency.
> > 6 records sent, 1.1 records/sec (30.09 MB/sec), 736.2 ms avg latency,
> > 1255.0 max latency.
> > 5 records sent, 1.0 records/sec (27.62 MB/sec), 626.8 ms avg latency,
> > 685.0 max latency.
> > 5 records sent, 1.0 records/sec (28.38 MB/sec), 608.8 ms avg latency,
> > 685.0 max latency.
> >
> >
> > On the broker, I change the
> >
> > socket.send.buffer.bytes=2024000
> > # The receive buffer (SO_RCVBUF) used by the socket server
> > socket.receive.buffer.bytes=2224000
> >
> > and all others are default.
> >
> > I am a little surprised to see about 1 s max latency and average about
> 0.5
> > s. my understanding is kafka is doing the memory mapping for log file and
> > let system flush it. all the write are sequential. So flush should be not
> > affected by message size that much. Batching and network will take
> longer,
> > but those are memory based and local machine. my ssd should be far better
> > than 0.5 second. where the time got consumed? any suggestion?
> >
> > Thanks,
> > Nan
> >
> >
> >
> >
> >
> >
> >
> > --
> > This message, and any attachments, is for the intended recipient(s) only,
> > may contain information that is privileged, confidential and/or
> proprietary
> > and subject to important terms and conditions available at
> > http://www.bankofamerica.com/emaildisclaimer.   If you are not the
> > intended recipient, please delete this message.
> >
>


Re: [VOTE] 2.2.0 RC2

2019-03-19 Thread Manikumar
+1 (non-binding)

- Verified the artifacts, build from src, ran tests
- Verified the quickstart, ran producer/consumer performance tests.

Thanks for running release!.

Thanks,
Manikumar

On Wed, Mar 20, 2019 at 12:19 AM David Arthur 
wrote:

> +1
>
> Validated signatures, and ran through quick-start.
>
> Thanks!
>
> On Mon, Mar 18, 2019 at 4:00 AM Jakub Scholz  wrote:
>
> > +1 (non-binding). I used the staged binaries and run some of my tests
> > against them. All seems to look good to me.
> >
> > On Sat, Mar 9, 2019 at 11:56 PM Matthias J. Sax 
> > wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the third candidate for release of Apache Kafka 2.2.0.
> > >
> > >  - Added SSL support for custom principal name
> > >  - Allow SASL connections to periodically re-authenticate
> > >  - Command line tool bin/kafka-topics.sh adds AdminClient support
> > >  - Improved consumer group management
> > >- default group.id is `null` instead of empty string
> > >  - API improvement
> > >- Producer: introduce close(Duration)
> > >- AdminClient: introduce close(Duration)
> > >- Kafka Streams: new flatTransform() operator in Streams DSL
> > >- KafkaStreams (and other classed) now implement AutoClosable to
> > > support try-with-resource
> > >- New Serdes and default method implementations
> > >  - Kafka Streams exposed internal client.id via ThreadMetadata
> > >  - Metric improvements:  All `-min`, `-avg` and `-max` metrics will now
> > > output `NaN` as default value
> > > Release notes for the 2.2.0 release:
> > > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/RELEASE_NOTES.html
> > >
> > > *** Please download, test, and vote by Thursday, March 14, 9am PST.
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > https://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >
> > > * Javadoc:
> > > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/javadoc/
> > >
> > > * Tag to be voted upon (off 2.2 branch) is the 2.2.0 tag:
> > > https://github.com/apache/kafka/releases/tag/2.2.0-rc2
> > >
> > > * Documentation:
> > > https://kafka.apache.org/22/documentation.html
> > >
> > > * Protocol:
> > > https://kafka.apache.org/22/protocol.html
> > >
> > > * Jenkins builds for the 2.2 branch:
> > > Unit/integration tests: https://builds.apache.org/job/kafka-2.2-jdk8/
> > > System tests:
> > https://jenkins.confluent.io/job/system-test-kafka/job/2.2/
> > >
> > > /**
> > >
> > > Thanks,
> > >
> > > -Matthias
> > >
> > >
> >
>


Re: [VOTE] 2.2.0 RC2

2019-03-19 Thread David Arthur
+1

Validated signatures, and ran through quick-start.

Thanks!

On Mon, Mar 18, 2019 at 4:00 AM Jakub Scholz  wrote:

> +1 (non-binding). I used the staged binaries and run some of my tests
> against them. All seems to look good to me.
>
> On Sat, Mar 9, 2019 at 11:56 PM Matthias J. Sax 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the third candidate for release of Apache Kafka 2.2.0.
> >
> >  - Added SSL support for custom principal name
> >  - Allow SASL connections to periodically re-authenticate
> >  - Command line tool bin/kafka-topics.sh adds AdminClient support
> >  - Improved consumer group management
> >- default group.id is `null` instead of empty string
> >  - API improvement
> >- Producer: introduce close(Duration)
> >- AdminClient: introduce close(Duration)
> >- Kafka Streams: new flatTransform() operator in Streams DSL
> >- KafkaStreams (and other classed) now implement AutoClosable to
> > support try-with-resource
> >- New Serdes and default method implementations
> >  - Kafka Streams exposed internal client.id via ThreadMetadata
> >  - Metric improvements:  All `-min`, `-avg` and `-max` metrics will now
> > output `NaN` as default value
> > Release notes for the 2.2.0 release:
> > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/RELEASE_NOTES.html
> >
> > *** Please download, test, and vote by Thursday, March 14, 9am PST.
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/javadoc/
> >
> > * Tag to be voted upon (off 2.2 branch) is the 2.2.0 tag:
> > https://github.com/apache/kafka/releases/tag/2.2.0-rc2
> >
> > * Documentation:
> > https://kafka.apache.org/22/documentation.html
> >
> > * Protocol:
> > https://kafka.apache.org/22/protocol.html
> >
> > * Jenkins builds for the 2.2 branch:
> > Unit/integration tests: https://builds.apache.org/job/kafka-2.2-jdk8/
> > System tests:
> https://jenkins.confluent.io/job/system-test-kafka/job/2.2/
> >
> > /**
> >
> > Thanks,
> >
> > -Matthias
> >
> >
>


Re: NOT_LEADER_FOR_PARTITION - Kafka Producer

2019-03-19 Thread Devang Shah
Appreciate if anyone could shed some light to this issue.

Thank you.

On Sun, Mar 17, 2019, 12:06 PM Devang Shah  wrote:

> Dear Kafka Users,
>
> We have recently discovered *NotLeaderForPartition *in our Kafka producer.
>
> *Below are the versions we are using,*
> Kafka Version (server) - 2.11.0-0.10.2.0
> Kafka Client - 0.11.0.1
> Spring Kafka - 1.3.3
> Camel Kafka - 2.20.2
>
> *Background*
> We have 4 brokers on Kafka cluster. We have one topic with four partitions
> and replication count of one. We have one producer which sends to specific
> partition (one of the four partitions) based on key settings and four
> consumers (one for each partition). On the incident day, two of the brokers
> were down (was unnoticed). The other two brokers were running fine but
> suddenly we started receiving *NotLeaderForPartition *in the producer
> logs. I observed certain logs on kafka server which said "Replication
> thread shutdown successfully" and it started the truncation process.
>
> *Temporary issue resolution*
> The Producer/Consumer were bounced for our application to start processing
> the messages. The Kafka cluster was not bounced and the other two brokers
> were still down.
>
> *Questions/Queries,*
> 1. Does the Truncation of logs on Kafka brokers affect the
> application process? Would it make the kafka cluster offline or the
> truncation happens in the background?
> 2. We have the the property *log.retention.check.interval.ms
> * set to *30 (every 5 mins)*.
> I think this is set to default and we can make a daily/weekly activity to
> reduce load on the kafka brokers and maximize Kafka brokers dedication to
> application processing.
> 3. Is there any known bug in *kafka-client-0.11.0.1* with respect to
> re-connection after a network glitch.
>
> Any pointers to resolve the above will be helpful. Thank you.
>
> Thanks & Regards,
> Devang
>


Re: Proxying the Kafka protocol

2019-03-19 Thread Hans Jespersen


You might want to take a look at kafka-proxy ( see 
https://github.com/grepplabs/kafka-proxy 
).
It’s a true kafka protocol proxy and modified the metadata like advertized 
listeners so it works when there is no ip routing between the client and the 
brokers.

-hans





> On Mar 19, 2019, at 8:19 AM, James Grant  wrote:
> 
> Hello,
> 
> We would like to expose a Kafka cluster running on one network to clients
> that are running on other networks without having to have full routing
> between the two networks. In this case these networks are in different AWS
> accounts but the concept applies more widely. We would like to access Kafka
> over a single (or very few) host names.
> 
> In addition we would like to filter incoming messages to enforce some level
> of data quality and also impose some access control.
> 
> A solution we are looking into is to provide a Kafka protocol level proxy
> that presents to clients as a single node Kafka cluster holding all the
> topics and partitions of the cluster behind it. This proxy would be able to
> operate in a load balanced cluster behind a single DNS entry and would also
> be able to intercept and filter/alter messages as they passed through.
> 
> The advantages we see in this approach over the HTTP proxy is that it
> presents the Kafka protocol whilst also meaning that we can use a typical
> TCP level load balancer that it is easy to route connections to. This means
> that we continue to use native Kafka clients.
> 
> Does anything like this already exist? Does anybody think it would useful?
> Does anybody know of any reason it would be impossible (or a bad idea) to
> do?
> 
> James Grant
> 
> Developer - Expedia Group



Re: Proxying the Kafka protocol

2019-03-19 Thread Matt Veitas
You might follow along with the Envoy proxy team and the work they are
doing to support the Kafka binary protocol:
https://github.com/envoyproxy/envoy/issues/2852

On Tue, Mar 19, 2019 at 11:46 AM Peter Bukowinski  wrote:

> https://docs.confluent.io/3.0.0/kafka-rest/docs/intro.html
>
> The Kafka REST proxy may be what you need. You can put multiple instances
> behind a load balancer to scale to your needs.
>
>
> -- Peter (from phone)
>
> > On Mar 19, 2019, at 8:30 AM, Ryanne Dolan  wrote:
> >
> > Hello James, I'm not aware of anything like that for Kafka, but you can
> use
> > MirrorMaker for network segmentation. With this approach you have one
> Kafka
> > cluster in each segment and a MM cluster in the more privileged segment.
> > You don't need to expose the privileged segment at all -- you just need
> to
> > let MM reach the external segment(s).
> >
> > Ryanne
> >
> >> On Tue, Mar 19, 2019, 10:20 AM James Grant  wrote:
> >>
> >> Hello,
> >>
> >> We would like to expose a Kafka cluster running on one network to
> clients
> >> that are running on other networks without having to have full routing
> >> between the two networks. In this case these networks are in different
> AWS
> >> accounts but the concept applies more widely. We would like to access
> Kafka
> >> over a single (or very few) host names.
> >>
> >> In addition we would like to filter incoming messages to enforce some
> level
> >> of data quality and also impose some access control.
> >>
> >> A solution we are looking into is to provide a Kafka protocol level
> proxy
> >> that presents to clients as a single node Kafka cluster holding all the
> >> topics and partitions of the cluster behind it. This proxy would be
> able to
> >> operate in a load balanced cluster behind a single DNS entry and would
> also
> >> be able to intercept and filter/alter messages as they passed through.
> >>
> >> The advantages we see in this approach over the HTTP proxy is that it
> >> presents the Kafka protocol whilst also meaning that we can use a
> typical
> >> TCP level load balancer that it is easy to route connections to. This
> means
> >> that we continue to use native Kafka clients.
> >>
> >> Does anything like this already exist? Does anybody think it would
> useful?
> >> Does anybody know of any reason it would be impossible (or a bad idea)
> to
> >> do?
> >>
> >> James Grant
> >>
> >> Developer - Expedia Group
> >>
>


Re: Proxying the Kafka protocol

2019-03-19 Thread Peter Bukowinski
https://docs.confluent.io/3.0.0/kafka-rest/docs/intro.html

The Kafka REST proxy may be what you need. You can put multiple instances 
behind a load balancer to scale to your needs.


-- Peter (from phone)

> On Mar 19, 2019, at 8:30 AM, Ryanne Dolan  wrote:
> 
> Hello James, I'm not aware of anything like that for Kafka, but you can use
> MirrorMaker for network segmentation. With this approach you have one Kafka
> cluster in each segment and a MM cluster in the more privileged segment.
> You don't need to expose the privileged segment at all -- you just need to
> let MM reach the external segment(s).
> 
> Ryanne
> 
>> On Tue, Mar 19, 2019, 10:20 AM James Grant  wrote:
>> 
>> Hello,
>> 
>> We would like to expose a Kafka cluster running on one network to clients
>> that are running on other networks without having to have full routing
>> between the two networks. In this case these networks are in different AWS
>> accounts but the concept applies more widely. We would like to access Kafka
>> over a single (or very few) host names.
>> 
>> In addition we would like to filter incoming messages to enforce some level
>> of data quality and also impose some access control.
>> 
>> A solution we are looking into is to provide a Kafka protocol level proxy
>> that presents to clients as a single node Kafka cluster holding all the
>> topics and partitions of the cluster behind it. This proxy would be able to
>> operate in a load balanced cluster behind a single DNS entry and would also
>> be able to intercept and filter/alter messages as they passed through.
>> 
>> The advantages we see in this approach over the HTTP proxy is that it
>> presents the Kafka protocol whilst also meaning that we can use a typical
>> TCP level load balancer that it is easy to route connections to. This means
>> that we continue to use native Kafka clients.
>> 
>> Does anything like this already exist? Does anybody think it would useful?
>> Does anybody know of any reason it would be impossible (or a bad idea) to
>> do?
>> 
>> James Grant
>> 
>> Developer - Expedia Group
>> 


Re: Proxying the Kafka protocol

2019-03-19 Thread Ryanne Dolan
Hello James, I'm not aware of anything like that for Kafka, but you can use
MirrorMaker for network segmentation. With this approach you have one Kafka
cluster in each segment and a MM cluster in the more privileged segment.
You don't need to expose the privileged segment at all -- you just need to
let MM reach the external segment(s).

Ryanne

On Tue, Mar 19, 2019, 10:20 AM James Grant  wrote:

> Hello,
>
> We would like to expose a Kafka cluster running on one network to clients
> that are running on other networks without having to have full routing
> between the two networks. In this case these networks are in different AWS
> accounts but the concept applies more widely. We would like to access Kafka
> over a single (or very few) host names.
>
> In addition we would like to filter incoming messages to enforce some level
> of data quality and also impose some access control.
>
> A solution we are looking into is to provide a Kafka protocol level proxy
> that presents to clients as a single node Kafka cluster holding all the
> topics and partitions of the cluster behind it. This proxy would be able to
> operate in a load balanced cluster behind a single DNS entry and would also
> be able to intercept and filter/alter messages as they passed through.
>
> The advantages we see in this approach over the HTTP proxy is that it
> presents the Kafka protocol whilst also meaning that we can use a typical
> TCP level load balancer that it is easy to route connections to. This means
> that we continue to use native Kafka clients.
>
> Does anything like this already exist? Does anybody think it would useful?
> Does anybody know of any reason it would be impossible (or a bad idea) to
> do?
>
> James Grant
>
> Developer - Expedia Group
>


Proxying the Kafka protocol

2019-03-19 Thread James Grant
Hello,

We would like to expose a Kafka cluster running on one network to clients
that are running on other networks without having to have full routing
between the two networks. In this case these networks are in different AWS
accounts but the concept applies more widely. We would like to access Kafka
over a single (or very few) host names.

In addition we would like to filter incoming messages to enforce some level
of data quality and also impose some access control.

A solution we are looking into is to provide a Kafka protocol level proxy
that presents to clients as a single node Kafka cluster holding all the
topics and partitions of the cluster behind it. This proxy would be able to
operate in a load balanced cluster behind a single DNS entry and would also
be able to intercept and filter/alter messages as they passed through.

The advantages we see in this approach over the HTTP proxy is that it
presents the Kafka protocol whilst also meaning that we can use a typical
TCP level load balancer that it is easy to route connections to. This means
that we continue to use native Kafka clients.

Does anything like this already exist? Does anybody think it would useful?
Does anybody know of any reason it would be impossible (or a bad idea) to
do?

James Grant

Developer - Expedia Group


Re: Rebalancing and Its Impact

2019-03-19 Thread Janardhanan V S
Hey everyone,

Requesting some help here..

Thanks,
Janardhanan V S

On Sat, Mar 16, 2019 at 7:36 AM Janardhanan V S 
wrote:

> Hi,
>
> I'm new to Kafka and I'm trying to design a wrapper library in both Java
> and Go (uses Confluent/Kafka-Go) for Kafka to be used internally. For my
> use-case, CommitSync is a crucial step and we should do a read only after
> properly committing the old one. Repeated processing is not a big issue and
> our client service is idempotent enough. But data loss is a major issue and
> should not occur.
>
> I will create X number of consumers initially and will keep on polling
> from them. Hence I would like to know more about the negative scenario's
> that could happen here, Impact of them and how to properly handle them.
>
> I would like to know more about:
>
> 1) Network issue during consumer processing:
>  What happens when network goes of for a brief period and comes back?
> Does Kafka consumer automatically handle this and becomes alive when
> network comes back or do we have to reinitialise them? If they come back
> alive do they resume work from where they left of?
> Eg: Consumer X read 50 records from Partition Y. Now internally the
> consumer offset moved to +50. But before committing network issue happens
> and the comes back alive. Now will the consumer have the metadata about
> what it read for last poll. Can it go on to commit +50 in offset?
>
> 2) Rebalancing in consumer groups. Impact of them on existing consumer
> process - whether the existing working consumer instance will pause and
> resume work during a rebalance or do we have to reinitialize them? How long
> can rebalance occur? If the consumer comes back alive after rebalance, does
> it have metadata about it last read?
>
> 3) What happens when a consumer joins during a rebalancing. Ideally it is
> again a rebalancing scenario. What will happen now? The existing will be
> discarded and the new one starts or will wait for the existing rebalance to
> complete?
>
> Kindly help me understanding these scenarios is detail and suggest
> solutions if possible. Also it would be much more helpful, if you could
> point me to a resource - an online article / book or anything that provides
> insight into the intricate details of Kafka.
>
>
>
> Thanks and Regards,
> Janardhanan V S
>