Re: How to "buffer" a stream with high churn and output only at the end of a window?

2016-04-20 Thread Henry Cai
My use case is actually myTable.aggregate().to("output_topic"), so I need a way to suppress the number of outputs. I don't think correlating the internal cache flush with the output window emit frequency is ideal. It's too hard for application developer to see when the cache will be flushed, we

Re: KafkaProducer Retries in .9.0.1

2016-04-20 Thread Ismael Juma
Hi Nicolas, That seems to be a different issue than the one initially discussed in this thread. I suggest starting a new mailing list thread with the steps required to reproduce the problem. Thanks, Ismael On Wed, Apr 20, 2016 at 10:41 PM, Nicolas Phung wrote: > Hi

Re: KafkaProducer Retries in .9.0.1

2016-04-20 Thread Nicolas Phung
Hi Ismail, Thanks for you reply. For me, It's happening when I'm doing various breakdown (shutting down instances / zookeeper) on Kafka brokers on 0.9.0.1 that should simulate a leader is not available case. The same kind of breakdown on 0.8.2.2 client/broker can retry as expected. >From my

Re: KafkaProducer Retries in .9.0.1

2016-04-20 Thread Ismael Juma
Hi, This was explained earlier, I think. Retries are only attempted for retriable errors. If a message is too large, retrying won't help (it will still be too large). However, if a leader is not available, then a retry will happen as the leader may be available then. Ismael On Wed, Apr 20, 2016

Re: Is there a tool to list the topic-partitions on a particular broker

2016-04-20 Thread Vijay Patil
There is a awesome tool called "kafka-manager", which was opened sourced by Yahoo. https://github.com/yahoo/kafka-manager On 21 April 2016 at 08:07, Rajiv Kurian wrote: > The kafka-topics.sh tool lists topics and where the partitions are. Is > there a similar tool where I

Is there a tool to list the topic-partitions on a particular broker

2016-04-20 Thread Rajiv Kurian
The kafka-topics.sh tool lists topics and where the partitions are. Is there a similar tool where I could give it a broker id and it would give me all the topic-partitions on it? I want to bring down a few brokers but before doing that I want to make sure that I've migrated all topics away from

Re: How to "buffer" a stream with high churn and output only at the end of a window?

2016-04-20 Thread Henry Cai
0.10.0.1 is fine for me, I am actually building from trunk head for streams package. On Wed, Apr 20, 2016 at 5:06 PM, Guozhang Wang wrote: > I saw that note, thanks for commenting. > > I are cutting the next 0.10.0.0 RC next week, so I am not certain if it > will make it for

electing leader failed and result in 0 latest offset

2016-04-20 Thread Qi Xu
Hi folks, Recently we run into an odd issue that some partition's latest offset becomes 0. Here's the snapshot of the Kafka Manager. As you can see partition 2 and 3 becomes zero. *Partition* *Latest Offset* *Leader* *Replicas* *In Sync Replicas* *Preferred Leader?* *Under Replicated?* 0

Re: How to "buffer" a stream with high churn and output only at the end of a window?

2016-04-20 Thread Guozhang Wang
I saw that note, thanks for commenting. I are cutting the next 0.10.0.0 RC next week, so I am not certain if it will make it for 0.10.0.0. But we can push it to be in 0.10.0.1. Guozhang On Wed, Apr 20, 2016 at 4:57 PM, Henry Cai wrote: > Thanks. > > Do you know

Re: Tune Kafka offsets.load.buffer.size

2016-04-20 Thread Muqtafi Akhmad
Thank you Ben! On Thu, Apr 21, 2016 at 12:59 AM, Ben Stopford wrote: > If you have a relatively small number of consumers you might further > reduce offsets.topic.segment.bytes. The active segment is not compacted. > B > > On 18 Apr 2016, at 23:45, Muqtafi Akhmad

Re: How to "buffer" a stream with high churn and output only at the end of a window?

2016-04-20 Thread Guozhang Wang
Henry, I thought you were concerned about consumer memory contention. That's a valid point, and yes, you need to keep those buffered records in a persistent store. As I mentioned we are trying to do optimize the aggregation outputs as in https://issues.apache.org/jira/browse/KAFKA-3101 Its

Re: How to "buffer" a stream with high churn and output only at the end of a window?

2016-04-20 Thread Henry Cai
I think this scheme still has problems. If during 'holding' I literally hold (don't return the method call), I will starve the thread. If I am writing the output to a in-memory buffer and let the method returns, the kafka stream will acknowledge the record to upstream queue as processed, so I

Re: How to "buffer" a stream with high churn and output only at the end of a window?

2016-04-20 Thread Henry Cai
So hold the stream for 15 minutes wouldn't cause too much performance problems? On Wed, Apr 20, 2016 at 3:16 PM, Guozhang Wang wrote: > Consumer' buffer does not depend on offset committing, once it is given > from the poll() call it is out of the buffer. If offsets are not

Re: How to "buffer" a stream with high churn and output only at the end of a window?

2016-04-20 Thread Guozhang Wang
Consumer' buffer does not depend on offset committing, once it is given from the poll() call it is out of the buffer. If offsets are not committed, then upon failover it will simply re-consumer these records again from Kafka. Guozhang On Tue, Apr 19, 2016 at 11:34 PM, Henry Cai

Re: Using SSL with KafkaConsumer w/o client certificates

2016-04-20 Thread marko
After making the suggested change, I see this error during startup [2016-04-20 18:03:10,522] INFO [Kafka Server 0], started (kafka.server.KafkaServer) [2016-04-20 18:03:11,093] WARN Failed to send SSL Close message (org.apache.kafka.common.network.SslTransportLayer) java.io.IOException: Broken

Re: Kafka Streams: finding a solution to a particular use case

2016-04-20 Thread Guozhang Wang
Thanks Henry! On Wed, Apr 20, 2016 at 2:51 PM, Henry Cai wrote: > Created two Kafka JIRAs: > > KAFKA-3595: Add capability to specify replication compact option for stream > store > > KAFKA-3596: Kafka Streams: Window expiration needs to consider more than > event

Re: Kafka Streams: finding a solution to a particular use case

2016-04-20 Thread Henry Cai
Created two Kafka JIRAs: KAFKA-3595: Add capability to specify replication compact option for stream store KAFKA-3596: Kafka Streams: Window expiration needs to consider more than event time On Wed, Apr 20, 2016 at 11:43 AM, Guozhang Wang wrote: > Henry, > > Yes for

Re: KafkaProducer Retries in .9.0.1

2016-04-20 Thread Nicolas Phung
Hello, Have you solved this ? I'm encountering the same issue with the new Producer on 0.9.0.1 client with a 0.9.0.1 Kafka broker. We tried the same various breakdown (kafka(s), zookeeper) with 0.8.2.2 client and Kafka broker 0.8.2.2 and the retries work as expected on the older version. I'm

Re: Spikes in kafka bytes out (while bytes in remain the same)

2016-04-20 Thread Tom Crayford
Is there any interest in changing this or exposing non replicated bytes out somewhere via JMX? It'd be nice to expose a real "what the consumers are doing from the broker's perspective" metric as well as the current one which munges together replication and other consumers. On Wed, Apr 20, 2016

Re: Spikes in kafka bytes out (while bytes in remain the same)

2016-04-20 Thread Jorge Rodriguez
Asaf, thanks for your explanation. This actually makes complete sense, as we have 2 replicas. So the math works out when taking this into consideration. Thanks! Jorge On Sat, Apr 16, 2016 at 9:32 PM, Asaf Mesika wrote: > Another thought: Brokers replicate data in. So a

How to run Consumer for a Long time

2016-04-20 Thread yeshwanth kumar
Hi i am using CDH kafka_2.10 0.9.0-kafka-2.0.0, i wrote a java kafka consumer process which needs to be run for long time, 8-10 hrs. what are the properties do i need to set in order to run consumer for a long time. how can i achieve real time processing with Kafka. can some one guide me

Re: Kafka Streams: finding a solution to a particular use case

2016-04-20 Thread Guozhang Wang
Henry, Yes for joining windows the key is actually a combo of {join window, key, sequenceID} and hence all records are unique, we do not need log compaction for its changelogs. Guozhang On Tue, Apr 19, 2016 at 11:28 PM, Henry Cai wrote: > In my case, the key space

Re: Tune Kafka offsets.load.buffer.size

2016-04-20 Thread Ben Stopford
If you have a relatively small number of consumers you might further reduce offsets.topic.segment.bytes. The active segment is not compacted. B > On 18 Apr 2016, at 23:45, Muqtafi Akhmad wrote: > > dear Kafka users, > > Is there any tips about how to configure

Re-balance after metaData refresh

2016-04-20 Thread vinay sharma
Hi Everyone, I see that on each metadata refresh a rebalance is triggered and any consumer in middle of a processing starts throwing errors like "UNKNOWN_MEMBER_ID" on commit. There is no change in partitions or leadership of partitions or brokers. Any idea what could cause this behavior? What

Default behavior for full broker

2016-04-20 Thread Lawrence Weikum
Hello, I'm curious about the expected or default behavior that might occur if a broker in the system has filled up. By that I mean when a broker has used all of its memory and disk space. Is the node simply removed from the system until space is cleared? As I'm thinking through this a

Re: Kafka Streams: finding a solution to a particular use case

2016-04-20 Thread Guillermo Lammers Corral
Thank you very much for all your responses, I have learned a lot. Backs to the question I asked at the start of the thread regarding the correct process of two datasets (~million data records) in which corresponding entry in each KTable will be sent at any time, i.e., each one could have been

kafka producers and consumers on different machine

2016-04-20 Thread Dimitris Pappas
Dear sir/madame My name is Dimitris Pappas and i am a research assistant in Athena R.C. in athens. I downloaded the standalone version of kafka and tried to create consumers and producers on other machines. The producers and consumers built, which were listening to 'localhost' worked like a

Re: kafka producers and consumers on different machine

2016-04-20 Thread Marko Bonaći
I'm assuming that you created a topic with replication factor 3, while having only a single broker. Try with replication factor 1 or add additional brokers. Marko Bonaći Monitoring | Alerting | Anomaly Detection | Centralized Log Management Solr & Elasticsearch Support Sematext

Re: Authorization Question

2016-04-20 Thread Pawley, John
Thanks everyone, I think we've managed to clear up the confusion. We might go via Kerberos however. But thanks for the help. John Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be

Re: Kafka security

2016-04-20 Thread Srividhya Shanmugam
Thanks again. That clarified the question. On Wed, Apr 20, 2016 at 9:55 AM, Tom Crayford wrote: > Yes > > On Wed, Apr 20, 2016 at 2:52 PM, Srividhya Shanmugam < > srivishanmu...@gmail.com> wrote: > > > Yes, I followed those steps for setting up SSL based authentication.

Re: Authorization Question

2016-04-20 Thread Tom Crayford
Note that the SSL username is the subject of the client certificate - without client certs you don't get custom usernames. On Wed, Apr 20, 2016 at 2:39 PM, Harsh J wrote: > Username would need to come in from the authentication layer. > > What is your choice of

Re: Kafka security

2016-04-20 Thread Tom Crayford
Yes On Wed, Apr 20, 2016 at 2:52 PM, Srividhya Shanmugam < srivishanmu...@gmail.com> wrote: > Yes, I followed those steps for setting up SSL based authentication. ok, If > I understand correclty, the subject name of the client cert is what I need > to use when running kafka-acls script to add

Re: Kafka security

2016-04-20 Thread Srividhya Shanmugam
Yes, I followed those steps for setting up SSL based authentication. ok, If I understand correclty, the subject name of the client cert is what I need to use when running kafka-acls script to add acls on topic. Those will be validated against the client cert trustore/keystore locations specified

Re: Authorization Question

2016-04-20 Thread Harsh J
Username would need to come in from the authentication layer. What is your choice of authentication mode? Based on SSL vs. Kerberos, you'll need to configure the clients per http://kafka.apache.org/documentation.html#security_configclients (SSL) which requires using a configuration properties

Re: Kafka security

2016-04-20 Thread Tom Crayford
Yes: http://kafka.apache.org/documentation.html#security_ssl On Wed, Apr 20, 2016 at 2:29 PM, Srividhya Shanmugam < srivishanmu...@gmail.com> wrote: > Thanks Tom. Should the custom client cert be generated and signed by CA in > all brokers? Is there an example or more documentation on this? >

Re: Kafka security

2016-04-20 Thread Srividhya Shanmugam
Thanks Tom. Should the custom client cert be generated and signed by CA in all brokers? Is there an example or more documentation on this? Sri On Wed, Apr 20, 2016 at 9:14 AM, Tom Crayford wrote: > Hi Sri, > > You can configure ACLs by using SSL client authentication with

Re: Kafka security

2016-04-20 Thread Tom Crayford
Hi Sri, You can configure ACLs by using SSL client authentication with a custom client cert - the subject of the client cert will be used as the ACL user. Thanks Tom On Wed, Apr 20, 2016 at 2:12 PM, Srividhya Shanmugam < srivishanmu...@gmail.com> wrote: > Kafka Team, > > I am trying to

Kafka security

2016-04-20 Thread Srividhya Shanmugam
Kafka Team, I am trying to integrate kafka security. I was able to authenticate using SSL(TLS) with a single broker/client and a two node set up. I started reading about ACLs and my understanding is ACLs can be configured with kerberos principals. Is there a way ACLs can be configured with

Re: Authorization Question

2016-04-20 Thread westfox
John, Setup SASL using the username match you set on ACL will work for your case. you can follow the steps in offical document. Ping On Wed, Apr 20, 2016 at 6:08 AM, Pawley, John wrote: > Hello, > > We have managed to enable the SimpleAuthorizer for Kafka, and we can no >

Re: Kafka support needed

2016-04-20 Thread todd
‎Rsyslog (8.15+) now supports producing to Kafka, and doesn't require java (that can be a bonus).   Rsyslog can use a disk buffer, then when it can connect to Kafka, it will start streaming logs until the connection drops. That's a pretty simple config, and there are lots of examples online. T

KafkaProducer NullPointerException

2016-04-20 Thread Prem Panchami
Hi, We have a Kafka producer app that participates in the larger system. It worked fine sending messages. We just added our new tracing utility (uses SLF4J, LOG4J2) which has a couple of loggers that use Kafka appenders. Now we get a null pointer exception when we try to create KafkaProducer.

Authorization Question

2016-04-20 Thread Pawley, John
Hello, We have managed to enable the SimpleAuthorizer for Kafka, and we can no longer connect to the local queue without authorization. However we can't figure out how to supply a username when trying to connect from the console producer. We have already added users with permissions via the

Re: Enable JMX on Kafka Brokers

2016-04-20 Thread Harsh J
Unless you are on 0.8.1, your JMX query object name should be 'kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec' By default the above name should give you an all-topics count, but you can also request it per-topic by using "topic=", for ex.:

Re: Kafka Streams: finding a solution to a particular use case

2016-04-20 Thread Matthias J. Sax
Log compaction can also delete keys if the payload for a key is null: "Compaction also allows from deletes. A message with a key and a null payload will be treated as a delete from the log. This delete marker will cause any prior message with that key to be removed (as would any new message with

Re: Using SSL with KafkaConsumer w/o client certificates

2016-04-20 Thread Rajini Sivaram
If your only listener is SSL, you should set security.inter.broker.protocol to SSL even for single-broker cluster since it is used by the controller. I would have expected an error in the logs though if this was not configured correctly. On Wed, Apr 20, 2016 at 1:34 AM,

Re: Kafka support needed

2016-04-20 Thread Yogesh BG
Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Batch Expired at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:56) at

Re: How to "buffer" a stream with high churn and output only at the end of a window?

2016-04-20 Thread Henry Cai
For the technique of custom Processor of holding call to context.forward(), if I hold it for 10 minutes, what does that mean for the consumer acknowledgement on source node? I guess if I hold it for 10 minutes, the consumer is not going to ack to the upstream queue, will that impact the consumer

Re: Kafka Streams: finding a solution to a particular use case

2016-04-20 Thread Henry Cai
In my case, the key space is unbounded. The key would be something like 'ad_id', this id is auto incrementing all the time. I understand the benefit of using compacted kafka topic for aggregation store, but I don't see much benefit of using compaction to replicate records in JoinWindow (there