Re: Receiving "The session timeout is not within an acceptable range" but AFAIK it is within range

2016-05-03 Thread Jaikiran Pai
From what you pasted, I can't say for certain whether you are using those properties as consumer level settings or broker level settings. The group.min.session.timeout.ms and the group.max.session.timeout.ms are broker level settings (as far as I understand) and should be part of your broker

Re: kafka 0.9 offset unknown after cleanup

2016-05-03 Thread Tom Crayford
Jun, Yep, you got it. If there are no offset commits for 24h, then your offset will disappear and be cleaned up. I think this behaviour is at least a little suboptimal and surprising, but it does prevent a lot of issues that arise when you rapidly add and remove consumer groups. I hope that

Re: Suggesstion for mixture of 0.8 and 0.9?

2016-05-03 Thread Tom Crayford
Hi there, JDK 6 is deprecated and there have been several security announcements for bugs that remain in it. You should make migrating off it a priority. Other than that, you *can* use the 0.8 producer/consumer with the 0.9 brokers - Kafka remains very backwards compatible. You won't get all the

Re: max number of partitions with v0.9.0

2016-05-03 Thread David Shepherd
Thanks Ben, that's what I thought and I believe your suggestion is essentially what I planned to implement. We have a single topic with raw messages that will be partitioned randomly on ingest (just for scalability). I planned to install a consumer group router that reads from this "raw" topic and

Re: reprocessing events from multiple topics

2016-05-03 Thread Benjamin Manns
Both of your ideas are doable. Another thing to keep in mind is that depending on your data source, late arriving data will not be sorted in front of the already committed events. You may need some windowing buffer to recalculate for stragglers. For the multiple-topic approach, check out Samza's

How to build kafka jar with all dependencies?

2016-05-03 Thread ravi singh
I used .*/gradlew jarAll* but still scala libs are missing from the jar? ​It should be something ​very simple which I might be missing. Please let me know if anyone knows. -- *Regards,* *Ravi*

Suggesstion for mixture of 0.8 and 0.9?

2016-05-03 Thread Flybean
Hi, everyone. We suppose to use Kafka as the under-layer messaging bus in our service framework. We want to use 0.9+ for ACL, monitor and more. In our sence , the "old" applications are running on JDK 6 which means kafka 0.9 is not suitable for them. So any suggestion for mixture of 0.8 and 0.9?

Re: max number of partitions with v0.9.0

2016-05-03 Thread Benjamin Manns
>From my knowledge (beginner's) each partition still requires at least a file selector on the Kafka brokers. The new consumer structure means consumers won't store data in Zookeeper, but topics and partitions still do. What I would do is key by your ID and place a rate limiting stream processor

Re: Kafka Monitoring using JMX Mbeans

2016-05-03 Thread Otis Gospodnetić
Hi, On Mon, Apr 25, 2016 at 4:14 AM, Mudit Kumar wrote: > Hi, > > Have anyone setup any monitoring using Mbeans ?What kind of command line > tools been used? > See https://sematext.com/spm/integrations/kafka-monitoring/ We use it for monitoring Kafka, ZooKeeper,

Re: kafka 0.9 offset unknown after cleanup

2016-05-03 Thread Jun MA
I think I figured that out. Based on the explanation on http://www.slideshare.net/jjkoshy/offset-management-in-kafka , offsets.retention.minutes is used for clean up dead consumer group. It means if one consumer group hasn’t commit

max number of partitions with v0.9.0

2016-05-03 Thread David Shepherd
I was wonder if the new Kafka Consumer introduced in 0.9.0 ( http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client) allows for a higher number of partitions in a given cluster since it removes the zookeeper dependency. I understand the file descriptor

java.io.IOException: Broken pipe

2016-05-03 Thread ????
WARN Failed to send SSL Close message (org.apache.kafka.common.network.SslTransportLayer) java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at

reprocessing events from multiple topics

2016-05-03 Thread Kyle Mathews
Hi Kafka Users, I'm thinking through how to convert my application to use Kafka. I use an event sourcing model and something I do frequently is reprocess old events when I change a model schema or update my processing code. In my current setup, I have few enough events that I can just load all

FetchRequest API Always Returns 10 Rows

2016-05-03 Thread Heath Ivie
Hi, I am in the process of writing a consumer using the binary protocol and I am having issues where the fetch request will only return 10 messages. Here is the code that I am using, so you can see the values (there are no settings hidden underneath): var frb = new

Re: How to define multiple serializers in kafka?

2016-05-03 Thread Ratha v
thanks..I go through the schema registry.. On 3 May 2016 at 22:46, Gerard Klijs wrote: > Then you're probably best of using the confluent schema registry, you can > then use the io.confluent.kafka.serializers.KafkaAvroDeserializer for the > client with

Re: batching related issue in 0.9.0 producer

2016-05-03 Thread Prabhu V
Hi Mayuresh, Staying on the BufferPool.java, could you tell me why we need the following piece of code if (this.availableMemory > 0 || !this.free.isEmpty()) { if (!this.waiters.isEmpty()) this.waiters.peekFirst().signal();

Re: Hash partition of key with skew

2016-05-03 Thread Wesley Chow
> > Upload to S3 is partitioned by the "key" field. I.e, one folder per key. It > does offset management to make sure offset commit is in sync with S3 upload. We do this in several spots and I wish we had built our system in such a way that we could just open source it. I’m sure many people

RE: Encryption at Rest

2016-05-03 Thread Martin Gainty
|_| > From: jim_hoagl...@symantec.com > To: users@kafka.apache.org; mgai...@hotmail.com > Date: Tue, 3 May 2016 10:11:00 -0700 > Subject: Re: Encryption at Rest > > MG>curious

Re: batching related issue in 0.9.0 producer

2016-05-03 Thread Mayuresh Gharat
created a jira : https://issues.apache.org/jira/browse/KAFKA-3651 Thanks, Mayuresh On Tue, May 3, 2016 at 2:03 PM, Mayuresh Gharat wrote: > Nice catch. Do you have a jira for this? > I can submit a patch right away. This should be a small patch. > > Thanks, > >

Re: Suggestions of pulling local application logs into Kafka

2016-05-03 Thread David Birdsong
On Tue, May 3, 2016 at 1:00 PM Banias H wrote: > I should add Flume is not an option for various reasons. > > On Tue, May 3, 2016 at 2:49 PM, Banias H wrote: > > > We use Kafka (0.9.x) internally in our pipeline and now we would like to > > ingest

Re: batching related issue in 0.9.0 producer

2016-05-03 Thread Mayuresh Gharat
Nice catch. Do you have a jira for this? I can submit a patch right away. This should be a small patch. Thanks, Mayuresh On Tue, May 3, 2016 at 1:56 PM, Prabhu V wrote: > Whenever the BufferPool throws a "Failed to allocate memory within the > configured max blocking time"

batching related issue in 0.9.0 producer

2016-05-03 Thread Prabhu V
Whenever the BufferPool throws a "Failed to allocate memory within the configured max blocking time" excepion, it should also remove the condition object from the waiters deque. Otherwise the condition object is stays forever in the deque. (i.e) "this.waiters.remove(moreMemory);" should happen

RE: Hash partition of key with skew

2016-05-03 Thread Tauzell, Dave
Ok, I see what you are doing. Unless you have 1500 partitions and 1500 consumers you will have consumers get records for different keys and will have to deal with the problem. If you can have 1500 consumers and partitions it will simplify your processing. -Dave Dave Tauzell | Senior

Re: Suggestions of pulling local application logs into Kafka

2016-05-03 Thread Banias H
I should add Flume is not an option for various reasons. On Tue, May 3, 2016 at 2:49 PM, Banias H wrote: > We use Kafka (0.9.x) internally in our pipeline and now we would like to > ingest application logs sitting in local file system of servers external to > the Kafka

Suggestions of pulling local application logs into Kafka

2016-05-03 Thread Banias H
We use Kafka (0.9.x) internally in our pipeline and now we would like to ingest application logs sitting in local file system of servers external to the Kafka cluster. We could write a producer program running on the application servers to push files to Kafka. However we wonder if we can leverage

Re: Hash partition of key with skew

2016-05-03 Thread Srikanth
So, there are a few consumers. One is a spark streaming job where we can go a partitionBy(key) and take a slight hit. There are two consumers which are just java apps. Multiple instance running in Marathon. One consumer reads records, does basic checks, buffers records on local disk and uploads to

Re: kafka 0.9 offset unknown after cleanup

2016-05-03 Thread Jun MA
Thanks for your reply. I checked the offset topic and the cleanup policy is actually compact. Topic:__consumer_offsetsPartitionCount:50 ReplicationFactor:3 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=uncompressed And I’m using 0.9.0.1 so the default

Re: Hash partition of key with skew

2016-05-03 Thread Stephen Powis
Having difficulty following the use cases behind this thread so this could be entirely noise... but I'll toss this out there in case its relevant: https://issues.apache.org/jira/browse/KAFKA- We had an issue where a producer was setting the same key on every message but distributing the

Re: Receiving ILLEGAL_GENERATION, but I can't find information on the exception.

2016-05-03 Thread Richard L. Burton III
Thank you Dana and David! That was my problem. The data stored in Kafka is very small ~2100 bytes per record, but the work being done to process each message takes time so the consumer doesn't get a chance to poll for new messages within the request duration. What I did was decrease the fetch

RE: Hash partition of key with skew

2016-05-03 Thread Tauzell, Dave
Srikanth, I think the most efficient use of the partitions would be to spread all messages evenly across all partitions by *not* using a key. Then, all of your consumers in the same consumer group would receive about equal numbers of messages. What will you do with the messages as you pull

Re: Hash partition of key with skew

2016-05-03 Thread Srikanth
Jens, Thanks for the link. That is something to consider. Of course it has downsides too. Wesley, That is some good info on hashing. We've explored a couple of these options. I see that you are hesitant to put these in production. Even we want to evaluate or options first. Dave, Our need to do

Re: Encryption at Rest

2016-05-03 Thread Jim Hoagland
MG>curious if Jim tested his encryption/decryption scenario on Kafka's stateless broker? MG>Jims idea could work if you want to implement a new serializer/deserializer for every new supported cipher Not sure if I understand. We didn't modify Kafka at all. I definitely recommend batching events

Re: Adding a broker

2016-05-03 Thread Mudit Agarwal
It will store new messages only,how i think you can migrate your old replicas on new brokers! Thanks,Mudit From: Jens Rantil To: "users@kafka.apache.org" Sent: Tuesday, 3 May 2016 6:21 PM Subject: Adding a broker Hi, When I added a

RE: Hash partition of key with skew

2016-05-03 Thread Tauzell, Dave
Yeah, that makes sense for the target system (Cassandra for example), but I don't see that you would need that for Kafka. Good info on hashing, though, that I am going to take a look at when I get time. -Dave Dave Tauzell | Senior Software Engineer | Surescripts O: 651.855.3042 |

Re: Hash partition of key with skew

2016-05-03 Thread Wesley Chow
I’m not the OP, but in our case, we sometimes want data locality. For example, suppose that we have 100 consumers that are building up a cache of customer -> data mapping. If customer data is spread randomly across all partitions then a query for that customer’s data would have to hit all 100

RE: Hash partition of key with skew

2016-05-03 Thread Tauzell, Dave
Do you need the messages to be ordered in some way? Why pass a key if you don't want all the messages to go to one partition? -Dave Dave Tauzell | Senior Software Engineer | Surescripts O: 651.855.3042 | www.surescripts.com | dave.tauz...@surescripts.com Connect with us: Twitter I LinkedIn

Re: Invalid TimeStamp Error while running WordCountDemo - kafka-0.10.0

2016-05-03 Thread Guozhang Wang
So do you mean even with the 0.10.0 console producer producing to the input topic (assuming it is a new topic and hence there is no old data produced to it from tech-preview console producer) without the one-line fix, you still see the issue? Guozhang On Tue, May 3, 2016 at 7:10 AM, Ramanan,

Re: Hash partition of key with skew

2016-05-03 Thread Wesley Chow
I’ve come up with a couple solutions since we too have a power law distribution. However, we have not put anything into practice. Fixed Slicing One simple thing to do is to take each key and slice it into some fixed number of partitions. So your function might be: (hash(key) % num) +

WARN Failed to send SSL Close message

2016-05-03 Thread ????
WARN Failed to send SSL Close message (org.apache.kafka.common.network.SslTransportLayer) java.io.IOException: Unexpected status returned by SSLEngine.wrap, expected CLOSED, received OK. Will not send close message to peer. at

RE: Invalid TimeStamp Error while running WordCountDemo - kafka-0.10.0

2016-05-03 Thread Ramanan, Buvana (Nokia - US)
Guozhang, For bug scenario, I initially I produced to the topic using console producer of confluent's alpha release (preview for streams). And later I produced to it using the console producer in version 0.10.0. But yesterday after the fix, I created a new input topic, produced to it (with

Re: Receiving ILLEGAL_GENERATION, but I can't find information on the exception.

2016-05-03 Thread vinay sharma
Hi, As already pointed out by David and Dana, Your process is taking time in processing polled records. This long processing time causes your consumers session to time out. To keep session alive consumer must send a heartbeat request with in specified session time out. A heartbeat request is

Re: Hash partition of key with skew

2016-05-03 Thread Jens Rantil
Hi, Not sure if this helps, but the way Loggly seem to do it is to have a separate topic for "noisy neighbors". See [1]. [1] https://www.loggly.com/blog/loggly-loves-apache-kafka-use-unbreakable-messaging-better-log-management/ Cheers, Jens On Wed, Apr 27, 2016 at 9:11 PM Srikanth

Adding a broker

2016-05-03 Thread Jens Rantil
Hi, When I added a replicated broker to a cluster, will it first stream historical logs from the master? Or will it simply starts storing new messages from producers? Thanks, Jens -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you

Re: How to define multiple serializers in kafka?

2016-05-03 Thread Gerard Klijs
Then you're probably best of using the confluent schema registry, you can then use the io.confluent.kafka.serializers.KafkaAvroDeserializer for the client with KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG="true" to get back the object, deserialized with the same version of the schema

RE: Encryption at Rest

2016-05-03 Thread Martin Gainty
MG>hopefully quick comment > Subject: Re: Encryption at Rest > From: bruno.rassae...@novazone.be > Date: Tue, 3 May 2016 08:55:52 +0200 > To: users@kafka.apache.org > > From what I understand, when using batch compression in Kafka, the files are > stored compressed. > Don’t really see the

Re: How to define multiple serializers in kafka?

2016-05-03 Thread Ratha v
I plan to use different topics for each type of object. (number of object types= number of topics).. So, I need deserializers/serializers= topics = number of objects. What would be the better way to achieve this? On 3 May 2016 at 18:20, Gerard Klijs wrote: > If you put

Re: How to define multiple serializers in kafka?

2016-05-03 Thread Gerard Klijs
If you put them in one topic, you will need one 'master' serializer/deserializers which can handle all the formats. I don't know how you would like to use Avro schemas, the confluent schema registry is by default configured to handle one schema at a time for one topic, but you could configure it

Re: kafka 0.9 offset unknown after cleanup

2016-05-03 Thread Gerard Klijs
Looks like it, you need to be sure the offset topic is using compaction, and the broker is set to enable compaction. On Tue, May 3, 2016 at 9:56 AM Jun MA wrote: > Hi, > > I’m using 0.9.0.1 new-consumer api. I noticed that after kafka cleans up > all old log

Re: Filter plugins in Kafka

2016-05-03 Thread Subramanian Karunanithi
Thanks everyone, shall try these options. Regards, Subramanian. K On Mon, May 2, 2016 at 9:43 AM, Andrew Otto wrote: > If you want something really simple and hacky, you could use kafkatee[1] > and kafkacat[2] together: > > kafkatee.conf: > > input [encoding=string] pipe

kafka 0.9 offset unknown after cleanup

2016-05-03 Thread Jun MA
Hi, I’m using 0.9.0.1 new-consumer api. I noticed that after kafka cleans up all old log segments(reach delete.retention time), I got unknown offset. bin/kafka-consumer-groups.sh --bootstrap-server server:9092 --new-consumer --group testGroup --describe GROUP, TOPIC, PARTITION, CURRENT OFFSET,

Re: Encryption at Rest

2016-05-03 Thread Bruno Rassaerts
From what I understand, when using batch compression in Kafka, the files are stored compressed. Don’t really see the difference between compression and encryption in that aspect. If Kafka would support pluggable algorithms for compression (it already supports two), it would be rather