Re: Cache Memory Kafka Process

2015-07-28 Thread Ewen Cheslack-Postava
Nilesh, It's expected that a lot of memory is used for cache. This makes sense because under the hood, Kafka mostly just reads and writes data to/from files. While Kafka does manage some in-memory data, mostly it is writing produced data (or replicated data) to log files and then serving those

Re: New consumer - offset one gets in poll is not offset one is supposed to commit

2015-07-28 Thread Stevo Slavić
Hello Jason, Thanks for reply! About your proposal, in general case it might be helpful. In my case it will not help much - I'm allowing each ConsumerRecord or subset of ConsumerRecords to be processed and ACKed independently and out of HLC process/thread (not to block partition), and then

Re: New consumer - offset one gets in poll is not offset one is supposed to commit

2015-07-28 Thread tao xiao
Correct me if I m wrong. If compaction is used +1 to indicate next offset is no longer valid. For the compacted section the offset is not increasing sequentially. i think you need to call the next offset of the last processed record to figure out what the next offset will be On Wed, 29 Jul 2015

kafka-python message offset?

2015-07-28 Thread Keith Wiley
I haven’t found a way to specify that a consumer should read from the beginning, or from any other explicit offset, or that the offset should be “reset” in any way. The command-line shell scripts (which I believe simply wrap the Scala tools) have flags for this sort of thing. Is there any way

Re: kafka-python message offset?

2015-07-28 Thread Dana Powers
Hi Keith, you can use the `auto_offset_reset` parameter to kafka-python's KafkaConsumer. It behaves the same as the java consumer configuration of the same name. See http://kafka-python.readthedocs.org/en/latest/apidoc/kafka.consumer.html#kafka.consumer.KafkaConsumer.configure for more details on

Re: New consumer - offset one gets in poll is not offset one is supposed to commit

2015-07-28 Thread Jay Kreps
It seems less weird if you think of the offset as the position of the consumer, i.e. it is on record 5. In some sense the consumer is actually in between records, i.e. if it has processed 4 and not processed 5 do you think about your position as being on 4 or on 5? Well not on 4 because it already

Kafka new producer thread safe question

2015-07-28 Thread Tao Feng
Hi Kafka experts, I am trying to understand how Kafka new producer works. From the Kafka new producer code/comment, it indicates the producer is thread safe which we could have multiple clients per KafkaProducer. Is RecordAccumulator the only place to take care thread synchronization for

Re: Number of kafka topics/partitions supported per cluster of n nodes

2015-07-28 Thread Prabhjot Bharaj
I would be using the servers available at my place of work. I dont have access to AWS servers. I would starting off with a small number of nodes in the cluster and then plot a graph with x-axis as the number of servers in the cluster and y-axis as the number of topics with partitions, before the

Re: Number of kafka topics/partitions supported per cluster of n nodes

2015-07-28 Thread JIEFU GONG
I think these would definitely be useful statistics to have and I've tried to do similar tests! The biggest difference is probably going to be the hardware specs on whatever cluster you decide to run it on. Maybe benchmarks performed on different AWS servers would be helpful too, but I'd like to

Re: Number of kafka topics/partitions supported per cluster of n nodes

2015-07-28 Thread Prabhjot Bharaj
@Jiefu Gong, Are the results of your tests available publicly? Regards, Prabhjot On Tue, Jul 28, 2015 at 10:35 PM, Prabhjot Bharaj prabhbha...@gmail.com wrote: I would be using the servers available at my place of work. I dont have access to AWS servers. I would starting off with a small

KafkaConfigurationError: No topics or partitions configured

2015-07-28 Thread Keith Wiley
I'm trying to get a basic consumer off the ground. I can create the consumer but I can't do anything at the message level: consumer = KafkaConsumer(topic, group_id=group_id, bootstrap_servers=[ip + : + port]) for m in consumer: print x

KAfka Mirror Maker

2015-07-28 Thread Prabhjot Bharaj
Hi, I'm using Mirror Maker with a cluster of 3 nodes and cluster of 5 nodes. I would like to ask - is the number of nodes a restriction for Mirror Maker? Also, are there any other restrictions or properties that should be common across both the clusters so that they continue mirroring. I'm

Understanding the Simple Consumer

2015-07-28 Thread JIEFU GONG
Hi all, Currently working on building some tools revolving around the Simple Consumer, and I was wondering if anyone had any good documentation or examples for it? I am aware that this: https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example exists but it doesn't give any

Re: Cache Memory Kafka Process

2015-07-28 Thread Nilesh Chhapru
Hi Ewen, I am using 3 brokers with 12 topic and near about 120-125 partitions without any replication and the message size is approx 15MB/message. The problem is when the cache memory increases and reaches to the max available the performance starts degrading also i am using Storm spot as

Re: KafkaConfigurationError: No topics or partitions configured

2015-07-28 Thread JIEFU GONG
This won't be very helpful as I am not too experienced with Python or your use case, but Java Consumers to indeed have to create a String out of the byte array returned from a successful consumption like: String actualmsg = new String(messageAndMetadata.message()) Consult something like this as

Re: KafkaConfigurationError: No topics or partitions configured

2015-07-28 Thread Dana Powers
Hi Keith, kafka-python raises FailedPayloadsError on unspecified server failures. Typically this is caused by a server exception that results in a 0 byte response. Have you checked your server logs? -Dana On Tue, Jul 28, 2015 at 2:01 PM, JIEFU GONG jg...@berkeley.edu wrote: This won't be

Re: KafkaConfigurationError: No topics or partitions configured

2015-07-28 Thread JIEFU GONG
Can you confirm that there are indeed messages in the topic that you published to? bin/kafka-console-consumer.sh --zookeeper [details] --topic [topic] --from-beginning That should be the right command, and you can use that to first verify that messages have indeed been published to the topic in

Re: KafkaConfigurationError: No topics or partitions configured

2015-07-28 Thread Keith Wiley
Thank you. It looks like I had the 'topic' slightly wrong. I didn't realize it was case-sensitive. I got past that error, but now I'm bumping up against another error: Traceback (most recent call last): ... File /usr/local/lib/python2.7/dist-packages/kafka/consumer/kafka.py, line 59, in