Re: Message routing, Kafka-to-REST and HTTP API tools/frameworks for Kafka?

2015-03-25 Thread Nagesh
Hi,

I think for 2) you can use Kafka Consumer and push messages to vertex event
bus, which already have REST implementation (vertx-jersey).

I would say, Vertx cluster can be used as receive data irrespective of
topic and then publish to particular kafka topic. Then consume messages
from kafka by different consumer and distribute. Kafka can hold messages
without dropping at bursts and even at the time down stream slows down.

Regards,
Nageswara Rao

On Wed, Mar 25, 2015 at 10:58 AM, Manoj Khangaonkar khangaon...@gmail.com
wrote:

 Hi,

 For (1) and perhaps even for (2) where distribution/filtering on scale is
 required, I would look at using Apache Storm with kafka.

 For (3) , it seems you just need REST services wrapping kafka
 consumers/producers. I would start with usual suspects like jersey.

 regards

 On Tue, Mar 24, 2015 at 12:06 PM, Valentin kafka-...@sblk.de wrote:

 
  Hi guys,
 
  we have three Kafka use cases for which we have written our own PoC
  implementations,
  but where I am wondering whether there might be any fitting open source
  solution/tool/framework out there.
  Maybe someone of you has some ideas/pointers? :)
 
  1) Message routing/distribution/filter tool
  We need to copy messages from a set of input topics to a set of output
  topics
  based on their message key values. Each message in an input topic will go
  to 0 to N output topics,
  each output topic will receive messages from 0 to N input topics.
  So basically the tool acts as a message routing component in our system.
  Example configuration:
  input topic A:output topic K:key value 1,key value 2,key value
 3
  input topic A:output topic L:key value 2,key value 4
  input topic B:output topic K:key value 5,key value 6
  ...
  It would also be interesting to define distribution/filter rules based on
  regular expressions on the message key or message body.
 
  2) Kafka-to-REST Push service
  We need to consume messages from a set of topics, translate them into
 REST
  web service calls
  and forward the data to existing, non-Kafka-aware systems with REST APIs
  that way.
 
  3) HTTP REST API for consumers and producers
  We need to expose the simple consumer and the producer functionalities
 via
  REST web service calls,
  with authentication and per-topic-authorization on REST API level and TLS
  for transport encryption.
  Offset tracking is done by the connected systems, not by the
  broker/zookeeper/REST API.
  We expect a high message volume in the future here, so performance would
  be a key concern.
 
  Greetings
  Valentin
 



 --
 http://khangaonkar.blogspot.com/




-- 
Thanks,
Nageswara Rao.V

*The LORD reigns*


Re: Hadoop Summit Meetups

2014-06-05 Thread Nagesh
As Junn Rao said, it is pretty much possible multiple publishers publishes
to a topic and different group of consumers can consume a message and apply
group specific logic example raw data processing, aggregation etc., Each
distinguished group will receive a copy.

But the offset cannot be used UUID as the counter may reset incase you
restart Kafka for some reasons. Not sure, can someone throw some light?

Regards,
Nageswara Rao


On Thu, Jun 5, 2014 at 8:18 PM, Jun Rao jun...@gmail.com wrote:

 It sounds like that you want to write to a data store and a data pipe
 atomically. Since both the data store and the data pipe that you want to
 use are highly available, the only case that you want to protect is the
 client failing btw the two writes. One way to do that is to let the client
 publish to Kafka first with the strongest ack. Then, run a few consumers to
 read data from Kafka and then write the data to the data store. Any one of
 those consumers can die and the work will be automatically picked up by the
 remaining ones. You can use partition id and the offset of each message as
 its UUID if needed.

 Thanks,

 Jun


 On Wed, Jun 4, 2014 at 10:56 AM, Jonathan Hodges hodg...@gmail.com
 wrote:

  Sorry didn't realize the mailing list wasn't copied...
 
 
  -- Forwarded message --
  From: Jonathan Hodges hodg...@gmail.com
  Date: Wed, Jun 4, 2014 at 10:56 AM
  Subject: Re: Hadoop Summit Meetups
  To: Neha Narkhede neha.narkh...@gmail.com
 
 
  We have a number of customer facing online learning applications.  These
  applications are using heterogeneous technologies with different data
  models in underlying data stores such as RDBMS, Cassandra, MongoDB, etc.
   We would like to run offline analysis on the data contained in these
  learning applications with tools like Hadoop and Spark.
 
  One thought is to use Kafka as a way for these learning applications to
  emit data in near real-time for analytics.  We developed a common model
  represented as Avro records in HDFS that spans these learning
 applications
  so that we can accept the same structured message from them.  This allows
  for comparing apples to apples across these apps as opposed to messy
  transformations.
 
  So this all sounds good until you dig into the details.  One pattern is
 for
  these applications to update state locally in their data stores first and
  then publish to Kafka.  The problem with this is these two operations
  aren't atomic so the local persist can succeed and the publish to Kafka
  fail leaving the application and HDFS out of sync.  You can try to add
 some
  retry logic to the clients, but this quickly becomes very complicated and
  still doesn't solve the underlying problem.
 
  Another pattern is to publish to Kafka first with -1 and wait for the ack
  from leader and replicas before persisting locally.  This is probably
  better than the other pattern but does add some complexity to the client.
   The clients must now generate unique entity IDs/UUID for persistence
 when
  they typically rely on the data store for creating these.  Also the
 publish
  to Kafka can succeed and persist locally can fail leaving the stores out
 of
  sync.  In this case the learning application needs to determine how to
 get
  itself in sync.  It can rely on getting this back from Kafka, but it is
  possible the local store failure can't be fixed in a timely manner e.g.
  hardware failure, constraint, etc.  In this case the application needs to
  show an error to the user and likely need to do something like send a
  delete message to Kafka to remove the earlier published message.
 
  A third last resort pattern might be go the CDC route with something like
  Databus.  This would require implementing additional fetchers and relays
 to
  support Cassandra and MongoDB.  Also the data will need to be transformed
  on the Hadoop/Spark side for virtually every learning application since
  they have different data models.
 
  I hope this gives enough detail to start discussing transactional
 messaging
  in Kafka.  We are willing to help in this effort if it makes sense for
 our
  use cases.
 
  Thanks
  Jonathan
 
 
 
  On Wed, Jun 4, 2014 at 9:44 AM, Neha Narkhede neha.narkh...@gmail.com
  wrote:
 
   If you are comfortable, share it on the mailing list. If not, I'm happy
  to
   have this discussion privately.
  
   Thanks,
   Neha
   On Jun 4, 2014 9:42 AM, Neha Narkhede neha.narkh...@gmail.com
 wrote:
  
   Glad it was useful. It will be great if you can share your
 requirements
   on atomicity. A couple of us are very interested in thinking about
   transactional messaging in Kafka.
  
   Thanks,
   Neha
   On Jun 4, 2014 6:57 AM, Jonathan Hodges hodg...@gmail.com wrote:
  
   Hi Neha,
  
   Thanks so much to you and the Kafka team for putting together the
  meetup.
It was very nice and gave people from out of town like us the
 ability
  to
   join in person.
  
   We are the guys from Pearson Education and we 

Re: Message details

2014-06-05 Thread Nagesh
Hi,

As per AMQP standards 0.9/1.0 any messaging system for that matter is just
a pipe/pipes allows multiple producers to publish messages, and allows
multiple pointers (A pointer per group) to consume message. It is upto the
message system to discard the message on expiry.

As the message is shared between multiple consumer groups, the system
should not allow one particular consumer/group to remove the message. The
idea behind is the function of any message system is just a temporary hop
that can be shared by multiple groups.

If you want to skip the message among the group, just consume and discard
at the group level. The group specific pointer on the topic will move on
and will not be available for any other consumer in that group.

Thanks,
Nageswara Rao


On Fri, Jun 6, 2014 at 3:47 AM, Guozhang Wang wangg...@gmail.com wrote:

 Hi Achanta,

 Your use case is quite interesting. If I do not understand wrong you want
 to use a transaction that atomically consume on message from a partition
 and send it to another partition correct? I pre-assume by saying remove
 the message from that partition  you actually mean to skip consuming it in
 the future?

 Currently Kafka does not have functionality to remove a message from the
 log. And I am not sure what you mean by access a message?

 Guozhang


 On Thu, Jun 5, 2014 at 11:32 AM, Achanta Vamsi Subhash 
 achanta.va...@flipkart.com wrote:

  Hi,
 
  We are experimenting Kafka for a MQ use-case. We found it very useful but
  couldn't find the following info from the documentation:
 
  I have a consumer logic which can say that a message consumption failed.
 Is
  there any way I can remove the message from that partition and put it in
  other topic? Can be done in an equivalent way to a database transaction,
  importantly: can i do either both or none?
 
  - How can I remove a message of a particular topic?  Is there any
  client-api for it?
  - How can I access a message? Is there any client-api for it?
 
  --
  Regards
  Vamsi Subhash
 



 --
 -- Guozhang




-- 
Thanks  Regards,
Nageswara Rao.V