Re: Is kafka suitable for our architecture?

2014-10-09 Thread Albert Vila
Hi We process data in real time, and we are taking a look at Storm and Spark streaming too, however our actions are atomic, done at a document level so I don't know if it fits on something like Storm/Spark. Regarding what you Christian said, isn't Kafka used for scenarios like the one I described

Re: Auto Purging Consumer Group Configuration [Especially Kafka Console Group]

2014-10-09 Thread Bhavesh Mistry
We just want to clean-up old configuration from ZK. We can check from the offset API so we can delete based on offset .. is that right ? there is no date last associated with Consumer Group ? Is that right in ZK configuration ? Thanks, Bhavesh On Thu, Oct 9, 2014 at 9:23 PM, Gwen Shapira wr

Re: Clarification about Custom Encoder/Decoder for serialization

2014-10-09 Thread Abraham Jacob
Thanks Jun. Appreciate your quick response. Once the encoder is instantiated, is it possible to get a reference to it? I tried to see if I could get it trough anything that the Producer exposes. Apparently, not... -abe On Thu, Oct 9, 2014 at 9:28 PM, Jun Rao wrote: > The encoder is instantiate

Re: Clarification about Custom Encoder/Decoder for serialization

2014-10-09 Thread Jun Rao
The encoder is instantiated once when the producer is constructed. Thanks, Jun On Thu, Oct 9, 2014 at 6:45 PM, Abraham Jacob wrote: > Hi All, > > I wanted to get some clarification on Kafka's Encoder/Decoder usage. > > Lets say I want to implement a custom Encoder. > > public class CustomMessa

Re: Auto Purging Consumer Group Configuration [Especially Kafka Console Group]

2014-10-09 Thread Gwen Shapira
The problem with Kafka is that we never know when a consumer is "truly" inactive. But - if you decide to define inactive as consumer who's last offset is lower than anything available on the log (or perhaps lagging by over X messages?), its fairly easy to write a script to detect and clean them di

Auto Purging Consumer Group Configuration [Especially Kafka Console Group]

2014-10-09 Thread Bhavesh Mistry
Hi Kafka, We have lots of lingering console consumer group people have created for testing or debugging purpose for one time use via bin/kafka-console-consumer.sh. Is there auto purging that clean script that Kafka provide ? Is three any API to find out inactive Consumer group and delete consume

Clarification about Custom Encoder/Decoder for serialization

2014-10-09 Thread Abraham Jacob
Hi All, I wanted to get some clarification on Kafka's Encoder/Decoder usage. Lets say I want to implement a custom Encoder. public class CustomMessageSerializer implements Encoder { @Override public byte[] toBytes(String arg0) { // serialize the MyCustomObject return serializedCustomObject ; }

including KAFKA-1555 in 0.8.2?

2014-10-09 Thread Jun Rao
Hi, Everyone, I just committed KAFKA-1555 (min.isr support) to trunk. I felt that it's probably useful to include it in the 0.8.2 release. Any objections? Thanks, Jun

Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-09 Thread Neha Narkhede
With SimpleConsumer, you will have to handle leader discovery as well as zookeeper based rebalancing. You can see an example here - https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example On Wed, Oct 8, 2014 at 11:45 AM, Sharninder wrote: > Thanks Gwen. This really helped.

Re: refactoring ZK so it is plugable, would this make sense?

2014-10-09 Thread S Ahmed
I want kafka features (w/o the redundancy) but don't want to have to run 3 zookeeper instances to save $$. On Thu, Oct 9, 2014 at 2:59 PM, Jun Rao wrote: > This may not be easy since you have to implement things like watcher > callbacks. What's your main concern with the ZK dependency? > > Thank

Re: create topic in multiple node kafka cluster

2014-10-09 Thread Sa Li
Hi, I kinda doubt whether I make it as an ensemble, since it shows root@DO-mq-dev:/etc/zookeeper/conf# zkServer.sh status JMX enabled by default Using config: /etc/zookeeper/conf/zoo.cfg Mode: standalone Mode is standalone instead of something else, here is my zoo.cfg, I did follow the instructi

Re: create topic in multiple node kafka cluster

2014-10-09 Thread Guozhang Wang
Sa, Usually you would not want to set up kafka brokers at the same machines with zk nodes, as that will add depending failures to the server cluster. Back to your original question, it seems your zk nodes do not form an ensemble, since otherwise their zk data should be the same. Guozhang On Thu

Re: refactoring ZK so it is plugable, would this make sense?

2014-10-09 Thread Jun Rao
This may not be easy since you have to implement things like watcher callbacks. What's your main concern with the ZK dependency? Thanks, Jun On Thu, Oct 9, 2014 at 8:20 AM, S Ahmed wrote: > Hi, > > I was wondering if the zookeeper library (zkutils.scala etc) was designed > in a more modular wa

Re: create topic in multiple node kafka cluster

2014-10-09 Thread Joel Koshy
It looks like You set up three separate ZK clusters, not an ensemble. You can take a look at http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkMulitServerSetup on how to set up an ensemble; and then register all three kafka brokers on that single zk ensemble. Joel On Thu, Oct 09, 2

create topic in multiple node kafka cluster

2014-10-09 Thread Sa Li
Hi, All I setup a 3-node kafka cluster on top of 3-node zk ensemble. Now I launch 1 broker on each node, the brokers will be randomly distributed to zk ensemble, see DO-mq-dev.1 [zk: localhost:2181(CONNECTED) 1] ls /brokers/ids [0, 1] pof-kstorm-dev1.2 [zk: localhost:2181(CONNECTED) 1] ls /broke

Re: MBeans, dashes, underscores, and KAFKA-1481

2014-10-09 Thread Neha Narkhede
I am going to vote for 1482 to be included in 0.8.2, if we have a patch submitted in a week. I think we've had this JIRA opened for too long and we held people back so it's only fair to release this. On Wed, Oct 8, 2014 at 9:40 PM, Jun Rao wrote: > Otis, > > Just have the patch ready asap. We ca

Re: Reassigning Partition Failing

2014-10-09 Thread Lung, Paul
Hi Joe, I simply restarted the leader broker, and things seem to work again. Thank you. Best, Paul Lung On 10/2/14, 1:26 AM, "Joe Stein" wrote: >What version of zookeeper are you running? > >First check to see if there is a znode for the >"/admin/reassign_partitions" in >zookeeper. > >If so, y

Re: Reassigning Partition Failing

2014-10-09 Thread Lung, Paul
Actually, reassigning the replica does work, even if the broker the partition resides on is dead. My problem was that there were some unknown issue with the leader. When I restarted the leader broker, it worked. Paul On 10/6/14, 11:41 AM, "Joe Stein" wrote: >Agreed, I think it is also a replace

Re: Is kafka suitable for our architecture?

2014-10-09 Thread Christian Csar
Apart from your data locality problem it sounds like what you want is a workqueue. Kafka's consumer structure doesn't lend itself too well to that use case as a single partition of a topic should only have one consumer instance per logical subscriber of the topic, and that consumer would not be abl

RE: refactoring ZK so it is plugable, would this make sense?

2014-10-09 Thread S Ahmed
Hi, I was wondering if the zookeeper library (zkutils.scala etc) was designed in a more modular way, would it make it possible to run a more "lean" version of kafka? The idea is I want to run kafka but with a less emphasis on it being durable with failover and more on it being a replacement for a

Re: how to identify rogue consumer

2014-10-09 Thread Jun Rao
Yes. Thanks, Jun On Wed, Oct 8, 2014 at 10:53 PM, Steven Wu wrote: > Jun, you mean trace level logging for requestAppender? > log4j.logger.kafka.network.Processor=TRACE, requestAppender > > if it happens again, I can try to enable it. > > On Wed, Oct 8, 2014 at 9:54 PM, Jun Rao wrote: > > > I

Re: Is kafka suitable for our architecture?

2014-10-09 Thread William Briggs
Manually managing data locality will become difficult to scale. Kafka is one potential tool you can use to help scale, but by itself, it will not solve your problem. If you need the data in near-real time, you could use a technology like Spark or Storm to stream data from Kafka and perform your pro

Is kafka suitable for our architecture?

2014-10-09 Thread Albert Vila
Hi I just came across Kafta when I was trying to find solutions to scale our current architecture. We are currently downloading and processing 6M documents per day from online and social media. We have a different workflow for each type of document, but some of the steps are keyword extraction, l