GenericRecord.toString produces invalid JSON for logical types

2018-08-23 Thread kant kodali
Hi All,

I have a serialized avro binary data represented by byte[] where one of the
field is long with a logical type of TimeStamp.

  Timestamp tsp = new Timestamp(1530228588182l);
Schema schema  = SchemaBuilder.builder()
.record("hello")
.fields()

.name("tsp").type(LogicalTypes.timestampMillis().addToSchema(Schema.create(Schema.Type.LONG))).noDefault()
.endRecord();
System.out.println(schema.toString());

GenericRecord genericRecord = new GenericData.Record(schema);
genericRecord.put("tsp",tsp.getTime());


I serialized the above generic record to byte[] and used below two methods
to deserialize but both of them produce invalid JSON.

public static GenericRecord deserialize(final Schema schema, byte[]
data) throws IOException {
final GenericData genericData = new GenericData();
genericData.addLogicalTypeConversion(new
TimeConversions.TimestampConversion());
genericData.addLogicalTypeConversion(new
TimeConversions.TimeConversion());
try (final InputStream is = new ByteArrayInputStream(data)) {
final Decoder decoder =
DecoderFactory.get().binaryDecoder(is, null);
final DatumReader reader = new
GenericDatumReader<>(schema, schema, genericData);
return reader.read(null, decoder);
}
}

This produces {"tsp": 2018-06-28T23:29:48.182Z} this is not a valid json

so I also tried the following

public static GenericRecord deserialize(final Schema schema, byte[]
data) throws IOException {
final GenericData genericData = new GenericData(){
@Override
public String toString(Object datum) {
StringBuilder buffer = new StringBuilder();
// Since these types are not quoted and produce a
malformed JSON string, quote it here.
if (datum instanceof java.sql.Timestamp || datum
instanceof java.sql.Time || datum instanceof java.sql.Date) {
return
buffer.append("\"").append(datum).append("\"").toString();
}
return super.toString(datum);
}
};
genericData.addLogicalTypeConversion(new
TimeConversions.TimestampConversion());
genericData.addLogicalTypeConversion(new
TimeConversions.TimeConversion());
try (final InputStream is = new ByteArrayInputStream(data)) {
final Decoder decoder =
DecoderFactory.get().binaryDecoder(is, null);
final DatumReader reader = new
GenericDatumReader<>(schema, schema, genericData);
return reader.read(null, decoder);
}
}


I still get {"tsp": 2018-06-28T23:29:48.182Z} this is not a valid json

Expected output: {"tsp": "2018-06-28T23:29:48.182Z"}

Any ideas?

Thanks!


Re: Messages are repeating in kafka

2017-05-24 Thread kant kodali
@Abhimanyu

1) My guess is that topic offsets will remain for 30 days since that is the
configuration you are explicitly setting and Kafka should respect that
although I don't know for sure.

2) same as #1 offsets should remain to whatever time you specify.

What is the problem  with setting offsets.retention.minutes ==
log.retention.hours
? Did it fix the problem you were facing before?


On Tue, May 23, 2017 at 11:40 PM, Abhimanyu Nagrath <
abhimanyunagr...@gmail.com> wrote:

> Hi Kant,
>
> After setting this configuration offsets.retention.minutes . I am in doubt
> about the two things
>  1. If I am deleting a topic will that topic offsets would also get deleted
> or will they present for 30 days?
>  2. What will happen if for some topics my log.retention.hours = 168 and
> offsets.retention.minutes= 1440 * 30 ?
>
> Regards,
> Abhimanyu
>
> On Mon, May 22, 2017 at 4:09 PM, Abhimanyu Nagrath <
> abhimanyunagr...@gmail.com> wrote:
>
> > @Kant I was going through the offset related configurations before
> setting
> > offsets.retention.minutes so came accross this configuration and thought
> to
> > ask whether this should also be tuned or not.
> >
> >
> > Regards,
> > Abhimanyu
> >
> >
> >
> >
> > On Mon, May 22, 2017 at 2:24 PM, kant kodali <kanth...@gmail.com> wrote:
> >
> >> @Abhimanyu Why do you think you need to set that? Did you try setting
> >> offsets.retention.minutes
> >> = 1440 * 30 and still seeing duplicates?
> >>
> >> On Mon, May 22, 2017 at 12:37 AM, Abhimanyu Nagrath <
> >> abhimanyunagr...@gmail.com> wrote:
> >>
> >> > Hi Girish ,
> >> >
> >> > Do I need to tune this configuration offsets.retention.check.interv
> >> al.ms
> >> > also . Please let me know if I need to tune any other configuration.
> >> >
> >> >
> >> > Regards,
> >> > Abhimanyu
> >> >
> >> > On Sun, May 21, 2017 at 8:01 PM, Girish Aher <girisha...@gmail.com>
> >> wrote:
> >> >
> >> > > Yup, exactly as Kant said.
> >> > > Also make sure that the retention of the offsets topic is an upper
> >> bound
> >> > > across all topics. So in this case, don't create any other topics in
> >> the
> >> > > future with retention of more than 30 days or otherwise they may
> have
> >> the
> >> > > same problem too.
> >> > >
> >> > > On May 21, 2017 03:25, "Abhimanyu Nagrath" <
> >> abhimanyunagr...@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> Hi Kant,
> >> > >>
> >> > >> Thanks for the suggestion.
> >> > >>
> >> > >>
> >> > >> Regards,
> >> > >> Abhimanyu
> >> > >>
> >> > >> On Sun, May 21, 2017 at 3:44 PM, kant kodali <kanth...@gmail.com>
> >> > wrote:
> >> > >>
> >> > >>> @Abhimanyu You can try setting offset.retention = 30
> >> (log.retention).
> >> > At
> >> > >>> most, you will have a storage overhead of 5 million msgs per day *
> >> 30
> >> > >>> (days) * 8 bytes (for each offset) = 1.2GB (not that much since
> you
> >> > have
> >> > >>> a
> >> > >>> TB of hard disk)
> >> > >>>
> >> > >>> On Sun, May 21, 2017 at 3:05 AM, kant kodali <kanth...@gmail.com>
> >> > wrote:
> >> > >>>
> >> > >>> > Looking at that ticket and reading the comments it looks like
> one
> >> of
> >> > >>> the
> >> > >>> > concern is as follows.
> >> > >>> >
> >> > >>> > "offsets.retention.minutes is designed to handle the case that a
> >> > >>> consumer
> >> > >>> > group goes away forever. In that case, we don't want to store
> the
> >> > >>> offsets
> >> > >>> > for that group forever."
> >> > >>> >
> >> > >>> > This can simply be addressed by setting offset.retention ==
> >> > >>> log.retention
> >> > >>> > by default right? In which case offset wont be stored forever
> even
> >> > when
> >> > >>> > consumer group goes away forever. When the consumer group goes
> >> away
> >> > >>> forever
> >> > >>> > the upper bound to clean up offsets would be equal to
> >> log.retention.
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>> > On Sun, May 21, 2017 at 2:19 AM, kant kodali <
> kanth...@gmail.com>
> >> > >>> wrote:
> >> > >>> >
> >> > >>> >> What is your average message size and network speed?
> >> > >>> >>
> >> > >>> >> On Sun, May 21, 2017 at 2:04 AM, Abhimanyu Nagrath <
> >> > >>> >> abhimanyunagr...@gmail.com> wrote:
> >> > >>> >>
> >> > >>> >>> Hi Girish,
> >> > >>> >>>
> >> > >>> >>> I did not set any value for offsets.retention.minutes so
> >> therefore
> >> > >>> what I
> >> > >>> >>> think is picking its default value i.e 1440 minutes so what do
> >> you
> >> > >>> think
> >> > >>> >>> what should I set if I am keeping my data for 30 days?
> >> > >>> >>>
> >> > >>> >>> Regards,
> >> > >>> >>> Abhimanyu
> >> > >>> >>>
> >> > >>> >>
> >> > >>> >>
> >> > >>> >
> >> > >>>
> >> > >>
> >> > >>
> >> >
> >>
> >
> >
>


Re: Messages are repeating in kafka

2017-05-22 Thread kant kodali
@Abhimanyu Why do you think you need to set that? Did you try setting
offsets.retention.minutes
= 1440 * 30 and still seeing duplicates?

On Mon, May 22, 2017 at 12:37 AM, Abhimanyu Nagrath <
abhimanyunagr...@gmail.com> wrote:

> Hi Girish ,
>
> Do I need to tune this configuration offsets.retention.check.interval.ms
> also . Please let me know if I need to tune any other configuration.
>
>
> Regards,
> Abhimanyu
>
> On Sun, May 21, 2017 at 8:01 PM, Girish Aher <girisha...@gmail.com> wrote:
>
> > Yup, exactly as Kant said.
> > Also make sure that the retention of the offsets topic is an upper bound
> > across all topics. So in this case, don't create any other topics in the
> > future with retention of more than 30 days or otherwise they may have the
> > same problem too.
> >
> > On May 21, 2017 03:25, "Abhimanyu Nagrath" <abhimanyunagr...@gmail.com>
> > wrote:
> >
> >> Hi Kant,
> >>
> >> Thanks for the suggestion.
> >>
> >>
> >> Regards,
> >> Abhimanyu
> >>
> >> On Sun, May 21, 2017 at 3:44 PM, kant kodali <kanth...@gmail.com>
> wrote:
> >>
> >>> @Abhimanyu You can try setting offset.retention = 30 (log.retention).
> At
> >>> most, you will have a storage overhead of 5 million msgs per day * 30
> >>> (days) * 8 bytes (for each offset) = 1.2GB (not that much since you
> have
> >>> a
> >>> TB of hard disk)
> >>>
> >>> On Sun, May 21, 2017 at 3:05 AM, kant kodali <kanth...@gmail.com>
> wrote:
> >>>
> >>> > Looking at that ticket and reading the comments it looks like one of
> >>> the
> >>> > concern is as follows.
> >>> >
> >>> > "offsets.retention.minutes is designed to handle the case that a
> >>> consumer
> >>> > group goes away forever. In that case, we don't want to store the
> >>> offsets
> >>> > for that group forever."
> >>> >
> >>> > This can simply be addressed by setting offset.retention ==
> >>> log.retention
> >>> > by default right? In which case offset wont be stored forever even
> when
> >>> > consumer group goes away forever. When the consumer group goes away
> >>> forever
> >>> > the upper bound to clean up offsets would be equal to log.retention.
> >>> >
> >>> >
> >>> >
> >>> > On Sun, May 21, 2017 at 2:19 AM, kant kodali <kanth...@gmail.com>
> >>> wrote:
> >>> >
> >>> >> What is your average message size and network speed?
> >>> >>
> >>> >> On Sun, May 21, 2017 at 2:04 AM, Abhimanyu Nagrath <
> >>> >> abhimanyunagr...@gmail.com> wrote:
> >>> >>
> >>> >>> Hi Girish,
> >>> >>>
> >>> >>> I did not set any value for offsets.retention.minutes so therefore
> >>> what I
> >>> >>> think is picking its default value i.e 1440 minutes so what do you
> >>> think
> >>> >>> what should I set if I am keeping my data for 30 days?
> >>> >>>
> >>> >>> Regards,
> >>> >>> Abhimanyu
> >>> >>>
> >>> >>
> >>> >>
> >>> >
> >>>
> >>
> >>
>


Re: Messages are repeating in kafka

2017-05-21 Thread kant kodali
@Abhimanyu You can try setting offset.retention = 30 (log.retention). At
most, you will have a storage overhead of 5 million msgs per day * 30
(days) * 8 bytes (for each offset) = 1.2GB (not that much since you have a
TB of hard disk)

On Sun, May 21, 2017 at 3:05 AM, kant kodali <kanth...@gmail.com> wrote:

> Looking at that ticket and reading the comments it looks like one of the
> concern is as follows.
>
> "offsets.retention.minutes is designed to handle the case that a consumer
> group goes away forever. In that case, we don't want to store the offsets
> for that group forever."
>
> This can simply be addressed by setting offset.retention == log.retention
> by default right? In which case offset wont be stored forever even when
> consumer group goes away forever. When the consumer group goes away forever
> the upper bound to clean up offsets would be equal to log.retention.
>
>
>
> On Sun, May 21, 2017 at 2:19 AM, kant kodali <kanth...@gmail.com> wrote:
>
>> What is your average message size and network speed?
>>
>> On Sun, May 21, 2017 at 2:04 AM, Abhimanyu Nagrath <
>> abhimanyunagr...@gmail.com> wrote:
>>
>>> Hi Girish,
>>>
>>> I did not set any value for offsets.retention.minutes so therefore what I
>>> think is picking its default value i.e 1440 minutes so what do you think
>>> what should I set if I am keeping my data for 30 days?
>>>
>>> Regards,
>>> Abhimanyu
>>>
>>
>>
>


Re: Messages are repeating in kafka

2017-05-21 Thread kant kodali
Looking at that ticket and reading the comments it looks like one of the
concern is as follows.

"offsets.retention.minutes is designed to handle the case that a consumer
group goes away forever. In that case, we don't want to store the offsets
for that group forever."

This can simply be addressed by setting offset.retention == log.retention
by default right? In which case offset wont be stored forever even when
consumer group goes away forever. When the consumer group goes away forever
the upper bound to clean up offsets would be equal to log.retention.



On Sun, May 21, 2017 at 2:19 AM, kant kodali <kanth...@gmail.com> wrote:

> What is your average message size and network speed?
>
> On Sun, May 21, 2017 at 2:04 AM, Abhimanyu Nagrath <
> abhimanyunagr...@gmail.com> wrote:
>
>> Hi Girish,
>>
>> I did not set any value for offsets.retention.minutes so therefore what I
>> think is picking its default value i.e 1440 minutes so what do you think
>> what should I set if I am keeping my data for 30 days?
>>
>> Regards,
>> Abhimanyu
>>
>
>


Re: Messages are repeating in kafka

2017-05-21 Thread kant kodali
What is your average message size and network speed?

On Sun, May 21, 2017 at 2:04 AM, Abhimanyu Nagrath <
abhimanyunagr...@gmail.com> wrote:

> Hi Girish,
>
> I did not set any value for offsets.retention.minutes so therefore what I
> think is picking its default value i.e 1440 minutes so what do you think
> what should I set if I am keeping my data for 30 days?
>
> Regards,
> Abhimanyu
>


Re: is anyone able to create 1M or 10M or 100M or 1B partitions in a topic?

2017-05-16 Thread kant kodali
Got it! but do you mean directories or files? I thought partition = file
and topic = directory

If there are 1 partitions in a topic that means there are 1 files
in one directory?

On Tue, May 16, 2017 at 2:29 AM, Sameer Kumar <sam.kum.w...@gmail.com>
wrote:

> I don't think this would be the right approach. from broker side, this
> would mean creating 1M/10M/100M/1B directories, this would be too much for
> the file system itself.
>
> For most cases, even some thousand partitions per node should be
> sufficient.
>
> For more details, please refer to
> https://www.confluent.io/blog/how-to-choose-the-number-of-
> topicspartitions-in-a-kafka-cluster/
>
> -Sameer.
>
> On Tue, May 16, 2017 at 2:40 PM, kant kodali <kanth...@gmail.com> wrote:
>
> > Forgot to mention: The question in this thread is for one node which has
> 8
> > CPU's 16GB RAM & 500GB hard disk space.
> >
> > On Tue, May 16, 2017 at 2:06 AM, kant kodali <kanth...@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > 1. I was wondering if anyone has seen or heard or able to create 1M or
> > 10M
> > > or 100M or 1B partitions in a topic? I understand lot of this depends
> on
> > > filesystem limitations (we are using ext4) and the OS limitations but I
> > > just would like to know what is the scale one had seen in production?
> > > 2. Is it advisable?
> > >
> > > Thanks!
> > >
> >
>


Re: is anyone able to create 1M or 10M or 100M or 1B partitions in a topic?

2017-05-16 Thread kant kodali
Forgot to mention: The question in this thread is for one node which has 8
CPU's 16GB RAM & 500GB hard disk space.

On Tue, May 16, 2017 at 2:06 AM, kant kodali <kanth...@gmail.com> wrote:

> Hi All,
>
> 1. I was wondering if anyone has seen or heard or able to create 1M or 10M
> or 100M or 1B partitions in a topic? I understand lot of this depends on
> filesystem limitations (we are using ext4) and the OS limitations but I
> just would like to know what is the scale one had seen in production?
> 2. Is it advisable?
>
> Thanks!
>


is anyone able to create 1M or 10M or 100M or 1B partitions in a topic?

2017-05-16 Thread kant kodali
Hi All,

1. I was wondering if anyone has seen or heard or able to create 1M or 10M
or 100M or 1B partitions in a topic? I understand lot of this depends on
filesystem limitations (we are using ext4) and the OS limitations but I
just would like to know what is the scale one had seen in production?
2. Is it advisable?

Thanks!


Kafka Streams vs Flink (or any other stream processing framework)

2017-04-12 Thread kant kodali
Hi All,

I have read enough blogs from Confluent and others and also books that
tried to talk about the differences between the two and while it is great
to know those differences I hardly find them any useful when it comes to
decision making process of which one to pick since I don't see the clear
boundaries between Kafka streams vs Flink.

To be clear If I ask which one to pick? I am sure the answer will be
"depending on your use case". Well that's not what I intend to ask. All the
uses cases I came across till today can clearly be implemented using both
Kafka Streams or Flink (and I can give the benefit of doubt to someone who
probably must have seen better use cases) so the decision to pick one
becomes very subjective, biased and just a matter of convenience for
example "oh Flink would require additional deployment" or saying "Kafka is
meant for micro service architecture" and so on..Even with these examples I
don't see why we cannot implement with their counterpart therefore examples
like this would serve very little value for an Engineer's thought process.
For sure it will be a great sales talk.

So, What would be really useful to know is that what is possible with Flink
that is not possible or extremely hard to do with Kafka Streams for any use
case or Vice-versa ?  This way one would clearly know what the boundaries
are.

If Kafka Streams and Flink are in the same race that is fine too, and it
would be good to know that is the case so people can pick whatever they
like.

Thanks!


Re: Can multiple Kafka consumers read from the same partition of same topic by default?

2017-03-30 Thread kant kodali
Hi,

Thanks Can you explain little bit whats happening underneath? Does Kafka
creates different group.id's by default when group.id's are not set ? When
specified a group only one consumer can consumer from the group the for a
particular partition right?

Thanks!

On Thu, Mar 30, 2017 at 9:00 PM, Matthias J. Sax <matth...@confluent.io>
wrote:

> Yes, you can do that.
>
> -Matthias
>
>
>
> On 3/30/17 6:09 PM, kant kodali wrote:
> > Hi All,
> >
> > Can multiple Kafka consumers read from the same partition of same topic
> by
> > default? By default, I mean since group.id is not mandatory I am
> wondering
> > if I spawn multiple kafka consumers without specifying any group.id and
> > give them the same topic and partition name will they be able to read
> from
> > the same partition? I understand that If I give different group names for
> > each Kafka consumer then all the consumers can read from the same
> partition.
> >
> > Thanks!
> >
>
>


Can multiple Kafka consumers read from the same partition of same topic by default?

2017-03-30 Thread kant kodali
Hi All,

Can multiple Kafka consumers read from the same partition of same topic by
default? By default, I mean since group.id is not mandatory I am wondering
if I spawn multiple kafka consumers without specifying any group.id and
give them the same topic and partition name will they be able to read from
the same partition? I understand that If I give different group names for
each Kafka consumer then all the consumers can read from the same partition.

Thanks!


Stack Overflow Questions Jobs Documentation Tags Users Badges Ask Question Does the number of App instances and Zookeeper servers should be the same?

2017-01-25 Thread kant kodali
Does the number of App instances and Zookeeper servers should be the same?
I understand the requirement of 2F+1 to tolerate F failures but this is to
tolerate failures of Zookeeper instances itself. But how about the number
of App instances ? For example say I have 3 zookeeper servers and I have 2
instances of my APP running that are managed by zookeeper and at any given
time only one instance of my App will be running in master mode and the
other in standby mode. Now, I want to be able to tolerate one instance
failure of my APP itself (not the zookeeper instance) such that the other
instance of my APP which was running in a standby mode should be elected as
a new leader. Would that work? or I must have 3 instances of my App and 3
Zookeeper servers?

What is the right configuration for number of App instances and Zookeeper
servers ?


Thanks!


Re: How does Kafka emulate exactly once processing?

2016-12-21 Thread kant kodali
Hi Hans,

Thats a great answer compared to the paragraphs I read online! I am
assuming you meant HDFS? what is JSDC ? Any idea on which is more common
for this kind of use case? Also can I store offsets to zookeeper using ZAB
instead of using external store? I am not sure how zookeeper stores data
but I keep reading you can (perhaps zookeeper requires external storage?).

Thanks!

On Wed, Dec 21, 2016 at 5:11 PM, Hans Jespersen <h...@confluent.io> wrote:

> Exactly once Kafka Sink Connectors typically store the offset externally
> in the same atomic write as they store the messages. That way after a
> crash, they can check the external store (HSFS, JSDC, etc) retrieve the
> last committed offset and seek the the next message and continue processing
> with no duplicates and exactly once semantics.
>
> -hans
>
>
>
>
> > On Dec 21, 2016, at 4:39 PM, kant kodali <kanth...@gmail.com> wrote:
> >
> > How does Kafka emulate exactly once processing currently? Does it require
> > the producer to send at least once and consumer to de dupe?
> >
> > I did do my research but I feel like I am going all over the place so a
> > simple short answer would be great!
> >
> > Thanks!
>
>


How does Kafka emulate exactly once processing?

2016-12-21 Thread kant kodali
How does Kafka emulate exactly once processing currently? Does it require
the producer to send at least once and consumer to de dupe?

I did do my research but I feel like I am going all over the place so a
simple short answer would be great!

Thanks!


Re: is there a way to make sure two consumers receive the same message from the broker?

2016-11-08 Thread kant kodali
:)

On Tue, Nov 8, 2016 at 1:37 AM, AmirHossein Roozbahany <band...@outlook.com>
wrote:

> Excuse me this part was non-sense: if the latest update to a document in
> es always win in Cassandra's LWW, they will "eventually" "converge".
> 
> From: AmirHossein Roozbahany<mailto:band...@outlook.com>
> Sent: ‎11/‎8/‎2016 8:16 AM
> To: users@kafka.apache.org<mailto:users@kafka.apache.org>
> Subject: RE: is there a way to make sure two consumers receive the same
> message from the broker?
>
> Generally Cassandra itself is not consistent enough, even with quorum
> read-writes, say one of the writes fail, the nodes who received the data
> won't roll back and it might lead to dirty reads which in turn makes roll
> back logic tricky (if not impossible). You can use linearizable writes but
> if you want to use them all the time, why bother using Cassandra!?
>
> The important thing about Cassandra is that all of the nodes will
> eventually have the same data(after anti-entropy or read-repair). they are
> "convergent", they will "eventually" converge, and practically they
> converge pretty fast.
>
> I think what you might actually need is to make two databases
> convergent(not necessarily fully consistent at any given time) if the
> latest update to a document in es leads always win when Cassandra is doing
> es they will "eventually" "converge".
>
> Doing so is easy, as fast as I know es assigns a _version number to a
> document and increases it on every update, now if your use of in Cassandra
> insert statement as the "writetime". Now when Cassandra is doing read
> repair the record with higher writetime will win.
>
> Using es's document _version field is just one option, you can use
> something from you domain or kafka's offset or machine timestamp (not
> recommended at all).
>
> I hope it could help
> 
> From: kant kodali<mailto:kanth...@gmail.com>
> Sent: ‎11/‎7/‎2016 8:18 PM
> To: users@kafka.apache.org<mailto:users@kafka.apache.org>
> Subject: Re: is there a way to make sure two consumers receive the same
> message from the broker?
>
> Hi AmitHossein,
>
> I still don't see how that guarantees consistency at any given time. other
> words how do I know at time X the data in Cassandra and ES are the same.
>
> Thanks
>
>
> On Mon, Nov 7, 2016 at 3:26 AM, AmirHossein Roozbahany <
> diver...@outlook.com
> > wrote:
>
> > Hi
> >
> > Can you use elasticsearch _version field as cassandra's
> > writetime?(_version is strictly increasing, cassandra uses writetime for
> > applying LWW, so last write in elasticsearch will always win)
> >
> > It needs no transaction and makes databases convergent.
> >
> >
> > 
> > From: kant kodali <kanth...@gmail.com>
> > Sent: Monday, November 7, 2016 3:08 AM
> > To: users@kafka.apache.org
> > Subject: Re: is there a way to make sure two consumers receive the same
> > message from the broker?
> >
> > Hi Hans,
> >
> > The two storages we use are Cassandra and Elastic search and they are on
> > the same datacenter for now.
> > The Programming Language we use is Java and OS would be Ubuntu or CentOS.
> > We get messages in JSON format so we insert into Elastic Search directly
> > and for Cassandra we transform JSON message into appropriate model so we
> > could insert into a Cassandra table.
> > The rate we currently get is about 100K/sec which is awesome but I am
> > pretty sure this will go down once when we implement 2PC or transactional
> > writes.
> >
> > Thanks,
> > kant
> >
>


Re: is there a way to make sure two consumers receive the same message from the broker?

2016-11-07 Thread kant kodali
And there is this https://github.com/vroyer/elassandra which is still under
active development and not sure how they plan to keep up with Apache
Cassandra moving forward.

On Mon, Nov 7, 2016 at 9:36 AM, kant kodali <kanth...@gmail.com> wrote:

> Fixing typo's
>
> Hi Tauzell,
>
> Yeah our users want to query, do aggregations on Elastic Search directly
> and we cannot have inconsistent data  because say the writes didn't make it
> into Cassandra but made it to Elastic search then a simple aggregations
> like count will lead to a wrong answer but again as @Hans pointed out this
> is no longer a Kafka question and also your solution has merits in its own
> way which I really appreciate it! your solution does make writes faster and
> probably some performance penalty on the read side given repairs happen
> during the read stage in Cassandra (We could check in both but since our
> users query elastic search directly there is no way for us to check it in
> Cassandra else we could go with your solution as well).
>
> Basically, we use ES as an index for Cassandra since secondary indexes in
> Cassandra (including the latest implementation SASI) doesn't work with our
> use case since we have high cardinality columns (which means every row in a
> column is unique so index on a high cardinality column is not very
> efficient given the underlying data structure used by SASI, but with
> inverted index which is used by ES is much faster).
>
> We do use Apache Spark along with Cassandra and I am trying to explore
> Succint http://succinct.cs.berkeley.edu/wp/wordpress/ and if everything
> works out with Succint we can get rid of elastic search. The only thing
> that I worry and still testing with Spark, Cassandra and Succint is whether
> If the aggregations/computations of a column or search on particular
> Cassandra field/column  can happen in real time given a big dataset (with
> ES it does so the goal is to see if we can get somewhere close or perform
> even better).
>
> Thanks!
>
>
>


Re: is there a way to make sure two consumers receive the same message from the broker?

2016-11-07 Thread kant kodali
Fixing typo's

Hi Tauzell,

Yeah our users want to query, do aggregations on Elastic Search directly
and we cannot have inconsistent data  because say the writes didn't make it
into Cassandra but made it to Elastic search then a simple aggregations
like count will lead to a wrong answer but again as @Hans pointed out this
is no longer a Kafka question and also your solution has merits in its own
way which I really appreciate it! your solution does make writes faster and
probably some performance penalty on the read side given repairs happen
during the read stage in Cassandra (We could check in both but since our
users query elastic search directly there is no way for us to check it in
Cassandra else we could go with your solution as well).

Basically, we use ES as an index for Cassandra since secondary indexes in
Cassandra (including the latest implementation SASI) doesn't work with our
use case since we have high cardinality columns (which means every row in a
column is unique so index on a high cardinality column is not very
efficient given the underlying data structure used by SASI, but with
inverted index which is used by ES is much faster).

We do use Apache Spark along with Cassandra and I am trying to explore
Succint http://succinct.cs.berkeley.edu/wp/wordpress/ and if everything
works out with Succint we can get rid of elastic search. The only thing
that I worry and still testing with Spark, Cassandra and Succint is whether
If the aggregations/computations of a column or search on particular
Cassandra field/column  can happen in real time given a big dataset (with
ES it does so the goal is to see if we can get somewhere close or perform
even better).

Thanks!


Re: is there a way to make sure two consumers receive the same message from the broker?

2016-11-07 Thread kant kodali
Hi Tauzell,

Yeah our users want to query, do aggregations on Elastic Search directly
and we cannot have inconsistent data  because say the writes didn't make it
into Cassandra but made it to Elastic search then a simple aggregations
like count will lead to a wrong answer but again as @Hans pointed out this
is no longer a Kafka question and also your solution has merits in its own
way which I really appreciate it! your solution does make writes faster and
probably some performance penalty on the read side give repairs happen
during the read stage in Cassandra (We could check in both but since our
users query elastic search directly there is no way for us to check it in
Cassandra we could go with your solution as well).

Basically, we use ES as an index for Cassandra since secondary indexes in
Cassandra (including the latest implementation SASI) doesn't work with our
use case since we have high cardinality columns (which means every row in a
column is unique so index on a high cardinality column is not very
efficient given the underlying data structure used by SASI, but with
inverted index which is used by ES is much faster).

We do use Apache Spark along with Cassandra and I am trying to explore
Succint http://succinct.cs.berkeley.edu/wp/wordpress/ and if everything
works out with Succint we can get rid of elastic search. The only thing
that I worry and still testing with Spark, Cassandra and Succint is whether
If the aggregations/computations of a column or search on particular
Cassandra field/column  can happen in real time given a big dataset (with
ES it does so the goal is to see if we can get somewhere close or perform
even better).

Thanks!



On Mon, Nov 7, 2016 at 8:57 AM, Tauzell, Dave <dave.tauz...@surescripts.com>
wrote:

> Here is a scenario where this could be useful:
>
>Add the kafka offset as a field on the record in both Cassandra and
> Elasticsearch
>
> Now when you get search results from Elastic search and look up details in
> Cassandra you can know if they come from the same kafka record.   If you
> can use the offset as part of the Cassandra Partition key, or as a
> clustering key, then you could specifically retrieve a version of the
> record from Cassandra that matches Elasticsearch (assuming it made it).
>
> If your real goal is to guarantee that the two datasets always have the
> same set of messages from Kafka ... I don't think this is possible.
>
> -Dave
>
> -Original Message-
> From: kant kodali [mailto:kanth...@gmail.com]
> Sent: Monday, November 7, 2016 10:48 AM
> To: users@kafka.apache.org
> Subject: Re: is there a way to make sure two consumers receive the same
> message from the broker?
>
> Hi AmitHossein,
>
> I still don't see how that guarantees consistency at any given time. other
> words how do I know at time X the data in Cassandra and ES are the same.
>
> Thanks
>
>
> On Mon, Nov 7, 2016 at 3:26 AM, AmirHossein Roozbahany <
> diver...@outlook.com
> > wrote:
>
> > Hi
> >
> > Can you use elasticsearch _version field as cassandra's
> > writetime?(_version is strictly increasing, cassandra uses writetime
> > for applying LWW, so last write in elasticsearch will always win)
> >
> > It needs no transaction and makes databases convergent.
> >
> >
> > 
> > From: kant kodali <kanth...@gmail.com>
> > Sent: Monday, November 7, 2016 3:08 AM
> > To: users@kafka.apache.org
> > Subject: Re: is there a way to make sure two consumers receive the
> > same message from the broker?
> >
> > Hi Hans,
> >
> > The two storages we use are Cassandra and Elastic search and they are
> > on the same datacenter for now.
> > The Programming Language we use is Java and OS would be Ubuntu or CentOS.
> > We get messages in JSON format so we insert into Elastic Search
> > directly and for Cassandra we transform JSON message into appropriate
> > model so we could insert into a Cassandra table.
> > The rate we currently get is about 100K/sec which is awesome but I am
> > pretty sure this will go down once when we implement 2PC or
> > transactional writes.
> >
> > Thanks,
> > kant
> >
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>


Re: is there a way to make sure two consumers receive the same message from the broker?

2016-11-07 Thread kant kodali
Hi AmitHossein,

I still don't see how that guarantees consistency at any given time. other
words how do I know at time X the data in Cassandra and ES are the same.

Thanks


On Mon, Nov 7, 2016 at 3:26 AM, AmirHossein Roozbahany <diver...@outlook.com
> wrote:

> Hi
>
> Can you use elasticsearch _version field as cassandra's
> writetime?(_version is strictly increasing, cassandra uses writetime for
> applying LWW, so last write in elasticsearch will always win)
>
> It needs no transaction and makes databases convergent.
>
>
> ____
> From: kant kodali <kanth...@gmail.com>
> Sent: Monday, November 7, 2016 3:08 AM
> To: users@kafka.apache.org
> Subject: Re: is there a way to make sure two consumers receive the same
> message from the broker?
>
> Hi Hans,
>
> The two storages we use are Cassandra and Elastic search and they are on
> the same datacenter for now.
> The Programming Language we use is Java and OS would be Ubuntu or CentOS.
> We get messages in JSON format so we insert into Elastic Search directly
> and for Cassandra we transform JSON message into appropriate model so we
> could insert into a Cassandra table.
> The rate we currently get is about 100K/sec which is awesome but I am
> pretty sure this will go down once when we implement 2PC or transactional
> writes.
>
> Thanks,
> kant
>


Re: is there a way to make sure two consumers receive the same message from the broker?

2016-11-07 Thread kant kodali
Hi Hans,

We currently do #2 and thats is quite slow so yeah In thoery #1 is probably
a better choice although its not quite what we want since it doesn't
guarantee consistency at any given time as you have already pointed out.
Thanks a lot for the response!

kant

On Mon, Nov 7, 2016 at 6:31 AM, Hans Jespersen <h...@confluent.io> wrote:

> I don't believe that either of your two storage systems support distributed
> atomic transactions.
> You are just going to have to do one of the following:
> 1) update them separately (in parallel) and be aware that their committed
> offsets may be slightly different at certain points in time
> 2) update one and when you are sure the data is in the first storage, then
> update the other storage and be aware that you need to handle your own
> rollback logic if the second storage system is down or throws an error when
> you try to write to it.
>
> It is very common in Kafka community to do #1 but in either case this is no
> longer a Kafka question and has become more of a a distributed database
> design question.
>
> -hans
>
> /**
>  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
>  * h...@confluent.io (650)924-2670
>  */
>
> On Sun, Nov 6, 2016 at 7:08 PM, kant kodali <kanth...@gmail.com> wrote:
>
> > Hi Hans,
> >
> > The two storages we use are Cassandra and Elastic search and they are on
> > the same datacenter for now.
> > The Programming Language we use is Java and OS would be Ubuntu or CentOS.
> > We get messages in JSON format so we insert into Elastic Search directly
> > and for Cassandra we transform JSON message into appropriate model so we
> > could insert into a Cassandra table.
> > The rate we currently get is about 100K/sec which is awesome but I am
> > pretty sure this will go down once when we implement 2PC or transactional
> > writes.
> >
> > Thanks,
> > kant
> >
>


Re: is there a way to make sure two consumers receive the same message from the broker?

2016-11-06 Thread kant kodali
Hi Hans,

The two storages we use are Cassandra and Elastic search and they are on
the same datacenter for now.
The Programming Language we use is Java and OS would be Ubuntu or CentOS.
We get messages in JSON format so we insert into Elastic Search directly
and for Cassandra we transform JSON message into appropriate model so we
could insert into a Cassandra table.
The rate we currently get is about 100K/sec which is awesome but I am
pretty sure this will go down once when we implement 2PC or transactional
writes.

Thanks,
kant


Re: is there a way to make sure two consumers receive the same message from the broker?

2016-11-06 Thread kant kodali
Hi! Thanks. any pointers on how to do that?

On Sun, Nov 6, 2016 at 2:32 PM, Tauzell, Dave <dave.tauz...@surescripts.com>
wrote:

> You should have one consumer pull the message and submit the data to each
> storage using an XA transaction.
>
> > On Nov 5, 2016, at 19:49, kant kodali <kanth...@gmail.com> wrote:
> >
> > yes this problem can definetly be approached in many ways but given the
> > hard constraints by our clients we don't seem to have many options. so
> the
> > problem is we have to keep two storages systems in sync all the time. so
> > whatever the data that is storage 1 should also be in storage 2 at any
> > given time. so we explored the following options
> >
> > 1)  we thought about ETL from storage 1 to storage 2 but that approach
> has
> > bunch of  drawbacks given the time constraints.
> > 2)  Add a common service on top of two storages and do some sort of 2PC
> but
> > that would degrade the write performance. Morever we dont really have a
> > control over how fast each write/store can happen at each storage layer
> > (because these two storages are completely different).
> >
> > so I started exploring if there is any tricks I could do with Kafka?
> >
> >
> >
> >> On Sat, Nov 5, 2016 at 5:01 PM, Hans Jespersen <h...@confluent.io>
> wrote:
> >>
> >> Yes exactly. If consumer 1 gets message with offset 17 then it can write
> >> that offset into an external storage that consumer 2 can also check to
> >> ensure that it keeps in sync with consumer 1.
> >>
> >> Just curious though why you would need to do this? What is the use case
> >> because there may be a better way to get you the functionality you want?
> >>
> >> -hans
> >>
> >>
> >>
> >>
> >>> On Nov 5, 2016, at 4:31 PM, kant kodali <kanth...@gmail.com> wrote:
> >>>
> >>> I am new to Kafka and reading this statement "write consumer 1 and
> >> consumer
> >>> 2 to share a common external offset storage" I can interpret it many
> ways
> >>> but my best guess is as follows.
> >>>
> >>> Are you saying write the current offset of each consumer to a common
> >>> external storage?
> >>>
> >>>
> >>>> On Sat, Nov 5, 2016 at 4:15 PM, kant kodali <kanth...@gmail.com>
> wrote:
> >>>>
> >>>> Hi Hans,
> >>>>
> >>>> What do you mean by "write consumer 1 and consumer 2 to share a common
> >>>> external offset storage" ? can you please elaborate a bit more.
> >>>>
> >>>> Thanks!
> >>>>
> >>>> On Sat, Nov 5, 2016 at 4:00 PM, Hans Jespersen <h...@confluent.io>
> >> wrote:
> >>>>
> >>>>> There is no built in mechanism to do this in Apache Kafka but if you
> >> can
> >>>>> write consumer 1 and consumer 2 to share a common external offset
> >> storage
> >>>>> then you may be able to build the functionality you seek.
> >>>>>
> >>>>> -hans
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Nov 5, 2016, at 3:55 PM, kant kodali <kanth...@gmail.com> wrote:
> >>>>>>
> >>>>>> Sorry there is a typo. here is a restatement.
> >>>>>>
> >>>>>> Is there a way to make sure two consumers receive the same message
> >> from
> >>>>> the
> >>>>>> kafka broker in a atomic way? such that if consumer 1 gets a message
> >>>>>> consumer 2 should also get that message and if  consumer 1 fails for
> >>>>>> whatever reason consumer 2 should also rollback to previous offset
> (to
> >>>>> the
> >>>>>> same offset as consumer 1) or invalidate or something like that. is
> >> that
> >>>>>> possible?
> >>>>>
> >>>>>
> >>>>
> >>
> >>
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>


Re: is there a way to make sure two consumers receive the same message from the broker?

2016-11-05 Thread kant kodali
yes this problem can definetly be approached in many ways but given the
hard constraints by our clients we don't seem to have many options. so the
problem is we have to keep two storages systems in sync all the time. so
whatever the data that is storage 1 should also be in storage 2 at any
given time. so we explored the following options

1)  we thought about ETL from storage 1 to storage 2 but that approach has
bunch of  drawbacks given the time constraints.
2)  Add a common service on top of two storages and do some sort of 2PC but
that would degrade the write performance. Morever we dont really have a
control over how fast each write/store can happen at each storage layer
(because these two storages are completely different).

so I started exploring if there is any tricks I could do with Kafka?



On Sat, Nov 5, 2016 at 5:01 PM, Hans Jespersen <h...@confluent.io> wrote:

> Yes exactly. If consumer 1 gets message with offset 17 then it can write
> that offset into an external storage that consumer 2 can also check to
> ensure that it keeps in sync with consumer 1.
>
> Just curious though why you would need to do this? What is the use case
> because there may be a better way to get you the functionality you want?
>
> -hans
>
>
>
>
> > On Nov 5, 2016, at 4:31 PM, kant kodali <kanth...@gmail.com> wrote:
> >
> > I am new to Kafka and reading this statement "write consumer 1 and
> consumer
> > 2 to share a common external offset storage" I can interpret it many ways
> > but my best guess is as follows.
> >
> > Are you saying write the current offset of each consumer to a common
> > external storage?
> >
> >
> > On Sat, Nov 5, 2016 at 4:15 PM, kant kodali <kanth...@gmail.com> wrote:
> >
> >> Hi Hans,
> >>
> >> What do you mean by "write consumer 1 and consumer 2 to share a common
> >> external offset storage" ? can you please elaborate a bit more.
> >>
> >> Thanks!
> >>
> >> On Sat, Nov 5, 2016 at 4:00 PM, Hans Jespersen <h...@confluent.io>
> wrote:
> >>
> >>> There is no built in mechanism to do this in Apache Kafka but if you
> can
> >>> write consumer 1 and consumer 2 to share a common external offset
> storage
> >>> then you may be able to build the functionality you seek.
> >>>
> >>> -hans
> >>>
> >>>
> >>>
> >>>> On Nov 5, 2016, at 3:55 PM, kant kodali <kanth...@gmail.com> wrote:
> >>>>
> >>>> Sorry there is a typo. here is a restatement.
> >>>>
> >>>> Is there a way to make sure two consumers receive the same message
> from
> >>> the
> >>>> kafka broker in a atomic way? such that if consumer 1 gets a message
> >>>> consumer 2 should also get that message and if  consumer 1 fails for
> >>>> whatever reason consumer 2 should also rollback to previous offset (to
> >>> the
> >>>> same offset as consumer 1) or invalidate or something like that. is
> that
> >>>> possible?
> >>>
> >>>
> >>
>
>


Re: is there a way to make sure two consumers receive the same message from the broker?

2016-11-05 Thread kant kodali
I am new to Kafka and reading this statement "write consumer 1 and consumer
2 to share a common external offset storage" I can interpret it many ways
but my best guess is as follows.

Are you saying write the current offset of each consumer to a common
external storage?


On Sat, Nov 5, 2016 at 4:15 PM, kant kodali <kanth...@gmail.com> wrote:

> Hi Hans,
>
> What do you mean by "write consumer 1 and consumer 2 to share a common
> external offset storage" ? can you please elaborate a bit more.
>
> Thanks!
>
> On Sat, Nov 5, 2016 at 4:00 PM, Hans Jespersen <h...@confluent.io> wrote:
>
>> There is no built in mechanism to do this in Apache Kafka but if you can
>> write consumer 1 and consumer 2 to share a common external offset storage
>> then you may be able to build the functionality you seek.
>>
>> -hans
>>
>>
>>
>> > On Nov 5, 2016, at 3:55 PM, kant kodali <kanth...@gmail.com> wrote:
>> >
>> > Sorry there is a typo. here is a restatement.
>> >
>> > Is there a way to make sure two consumers receive the same message from
>> the
>> > kafka broker in a atomic way? such that if consumer 1 gets a message
>> > consumer 2 should also get that message and if  consumer 1 fails for
>> > whatever reason consumer 2 should also rollback to previous offset (to
>> the
>> > same offset as consumer 1) or invalidate or something like that. is that
>> > possible?
>>
>>
>


Re: is there a way to make sure two consumers receive the same message from the broker?

2016-11-05 Thread kant kodali
Hi Hans,

What do you mean by "write consumer 1 and consumer 2 to share a common
external offset storage" ? can you please elaborate a bit more.

Thanks!

On Sat, Nov 5, 2016 at 4:00 PM, Hans Jespersen <h...@confluent.io> wrote:

> There is no built in mechanism to do this in Apache Kafka but if you can
> write consumer 1 and consumer 2 to share a common external offset storage
> then you may be able to build the functionality you seek.
>
> -hans
>
>
>
> > On Nov 5, 2016, at 3:55 PM, kant kodali <kanth...@gmail.com> wrote:
> >
> > Sorry there is a typo. here is a restatement.
> >
> > Is there a way to make sure two consumers receive the same message from
> the
> > kafka broker in a atomic way? such that if consumer 1 gets a message
> > consumer 2 should also get that message and if  consumer 1 fails for
> > whatever reason consumer 2 should also rollback to previous offset (to
> the
> > same offset as consumer 1) or invalidate or something like that. is that
> > possible?
>
>


Re: is there a way to make sure two consumers receive the same message from the broker?

2016-11-05 Thread kant kodali
Sorry there is a typo. here is a restatement.

Is there a way to make sure two consumers receive the same message from the
kafka broker in a atomic way? such that if consumer 1 gets a message
consumer 2 should also get that message and if  consumer 1 fails for
whatever reason consumer 2 should also rollback to previous offset (to the
same offset as consumer 1) or invalidate or something like that. is that
possible?


is there a way to make sure two consumers receive the same message from the broker?

2016-11-05 Thread kant kodali
is there a way to make sure two consumers receive the same message from the
kafka broker in a atomic way? such that if consumer 1 gets a message
consumer 2 should also get that message and in case one of the consumer
fails for whatever reason consumer 2 should also rollback to previous
offset or invalidate or something like that. is that possible?


Re: producer can't push msg sometimes with 1 broker recoved

2016-09-23 Thread kant kodali
@Fei Just curious why you guys are interested in using Kafka. I thought
alcatel-lucent usually create their own software no?
 





On Fri, Sep 23, 2016 10:36 PM, Kamal C kamaltar...@gmail.com
wrote:
Reduce the metadata refresh interval 'metadata.max.age.ms' from 5 min to

your desired time interval.

This may reduce the time window of non-availability broker.




-- Kamal

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-23 Thread kant kodali
@Gerard
Here are my initial benchmarks
Producer on Machine 1 (m4.xlarge on AWS)Broker on Machine 2 (m4.xlarge on AWS)
Consumer on Machine 3 (m4.xlarge on AWS)
Data size 1.2KB
Receive throughtput: ~24K
Kafka Receive throughput ~58K (same exact configuration)
All the benchmarks I ran are with default options So what pulsar guys are saying
is that Kafka doesn't persist every message by default instead it would batch
them for a period of time and then persist so if the JVM crashes before it
persist all the messages that are in the batch are lost whereas pulsar
guarantees strong durability by storing every message to write ahead log so
messages are never lost.
My question now is that what settings I need to change in Kafka so it will store
every message? that way I am comparing apples to apples.
 





On Fri, Sep 23, 2016 12:06 AM, Gerard Klijs gerard.kl...@dizzit.com
wrote:
I haven't tried it myself, nor very likely will in the near future, but

since it's also distributed I guess that with a large enough cluster you

will be able to handle any load. One of the things kafka might be better at

is more connecters available, a better at least once guarantee, better

monitoring options. I really don't know, but if latancy is really important

pulsar might be better, they used kafka before at yahoo and maybe still do

for some stuff, recent work on https://github.com/yahoo/kafka-manager seems

to suggest so.

Alternatively you could configure a kafka topic/producer/consumer to limit

latency, and that may also be enough to get a low enough latency. It would

certainly be interesting to compare the two, with the same hardware, and

with high load.




On Thu, Sep 22, 2016 at 6:01 PM kant kodali <kanth...@gmail.com> wrote:




> @Gerard Thanks for this. It looks good any benchmarks on this throughput

> wise?

>

>

>

>

>

>

> On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com

> wrote:

> We have a simple application producing 1 msg/sec, and did nothing to

>

> optimise the performance and have about a 10 msec delay between consumer

>

> and producer. When low latency is important, maybe pulsar is a better fit,

>

> https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ .

>

>

>

>

> On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman <mikfree...@gmail.com>

>

> wrote:

>

>

>

>

> > Thanks for sharing Radek, great article.

>

> >

>

> > Michael

>

> >

>

> > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski <ra...@gruchalski.com>

>

> > wrote:

>

> > >

>

> > > Please read this article:

>

> > >

>

> >

>

>
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

>

> > >

>

> > > –

>

> > > Best regards,

>

> > > Radek Gruchalski

>

> > > ra...@gruchalski.com

>

> > >

>

> > >

>

> > > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com)

>

> > wrote:

>

> > >

>

> > > Still it should be possible to implement using reactive streams right.

>

> > > Could you please enlighten me on what are the some major differences

> you

>

> > > see

>

> > > between a commit log and a message queue? I see them being different

> only

>

> > > in the

>

> > > implementation but not functionality wise so I would be glad to hear

> your

>

> > > thoughts.

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski

> ra...@gruchalski.com

>

> > > wrote:

>

> > > Kafka is not a queue. It’s a distributed commit log.

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > –

>

> > >

>

> > > Best regards,

>

> > >

>

> > > Radek Gruchalski

>

> > >

>

> > > ra...@gruchalski.com

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com)

>

> > > wrote:

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > Hmm...Looks like Kafka is written in Scala. There is this thing called

>

> > >

>

> > > reactive

>

> > >

>

> > > streams where a slow consumer can apply 

Does Kafka Sync/persist every message from a publisher by default?

2016-09-22 Thread kant kodali

Does Kafka Sync/persist every message from a publisher by default? If not, What
settings should I change so I Sync every message?

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-22 Thread kant kodali
@Gerard Thanks for this. It looks good any benchmarks on this throughput wise?
 





On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com
wrote:
We have a simple application producing 1 msg/sec, and did nothing to

optimise the performance and have about a 10 msec delay between consumer

and producer. When low latency is important, maybe pulsar is a better fit,

https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ .




On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman <mikfree...@gmail.com>

wrote:




> Thanks for sharing Radek, great article.

>

> Michael

>

> > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski <ra...@gruchalski.com>

> wrote:

> >

> > Please read this article:

> >

>
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

> >

> > –

> > Best regards,

> > Radek Gruchalski

> > ra...@gruchalski.com

> >

> >

> > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com)

> wrote:

> >

> > Still it should be possible to implement using reactive streams right.

> > Could you please enlighten me on what are the some major differences you

> > see

> > between a commit log and a message queue? I see them being different only

> > in the

> > implementation but not functionality wise so I would be glad to hear your

> > thoughts.

> >

> >

> >

> >

> >

> >

> > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com

> > wrote:

> > Kafka is not a queue. It’s a distributed commit log.

> >

> >

> >

> >

> > –

> >

> > Best regards,

> >

> > Radek Gruchalski

> >

> > ra...@gruchalski.com

> >

> >

> >

> >

> >

> >

> >

> > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com)

> > wrote:

> >

> >

> >

> >

> > Hmm...Looks like Kafka is written in Scala. There is this thing called

> >

> > reactive

> >

> > streams where a slow consumer can apply back pressure if they are

> consuming

> >

> > slow. Even with Java this is possible with a Library called RxJava and

> >

> > these

> >

> > ideas will be incorporated in Java 9 as well.

> >

> > I still don't see why they would pick poll just to solve this one problem

> >

> > and

> >

> > compensating on others. Poll just don't sound realtime. I heard from some

> >

> > people

> >

> > that they would set poll to 100ms. Well 1) that is a lot of time. 2)

> >

> > Financial

> >

> > applications requires micro second latency. Kafka from what I understand

> >

> > looks

> >

> > like has a very high latency and here is the article.

> >

> > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by

> >

> > articles but I ran my own experiments on different queues and my numbers

> >

> > are

> >

> > very close to this article so I would say whoever wrote this article has

> >

> > done a

> >

> > good Job. 3) poll does generate unnecessary traffic in case if the data

> >

> > isn't

> >

> > available.

> >

> > Finally still not sure why they would pick poll() ? or do they plan on

> >

> > introducing reactive streams?Thanks,kant

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> > On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com

> >

> > wrote:

> >

> > I'm only guessing here regarding if this is the reason:

> >

> >

> >

> >

> > Pull is much more sensible when a lot of data is pushed through. It

> allows

> >

> > consumers consuming at their own pace, slow consumers do not slow the

> >

> > complete

> >

> > system down.

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> > --

> >

> >

> >

> >

> > Best regards,

> >

> >

> >

> >

> > Rad

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> > On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" <

> kanth...@gmail.com>

> >

> > wrote:

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> > why did Kafka choose pull instead of push for a consumer? push sounds

> like

> >

> > it

> >

> >

> >

> >

> > is more realtime to me than poll and also wouldn't poll just keeps

> polling

> >

> > even

> >

> >

> >

> >

> > when they are no messages in the broker causing more traffic? please

> >

> > enlighten

> >

> >

> >

> >

> > me

>

Re: any update on this?

2016-09-19 Thread kant kodali
Yes ofcourse the goal shouldn't be moving towards consul. It should just be
flexible enough for users to pick any distributed coordinated system.
 





On Mon, Sep 19, 2016 2:23 AM, Jens Rantil jens.ran...@tink.se
wrote:
I think I read somewhere that the long-term goal is to make Kafka

independent of Zookeeper alltogether. Maybe not worth spending time on

migrating to Consul in that case.




Cheers,

Jens




On Sat, Sep 17, 2016 at 10:38 PM Jennifer Fountain <jfount...@meetme.com>

wrote:




> +2 watching.

>

> On Sat, Sep 17, 2016 at 2:45 AM, kant kodali <kanth...@gmail.com> wrote:

>

> > https://issues.apache.org/jira/browse/KAFKA-1793

> > It would be great to use Consul instead of Zookeeper for Kafka and I

> think

> > it

> > would benefit Kafka a lot from the exponentially growing consul

> community.

>

>

>

>

> --

>

>

> Jennifer Fountain

> DevOPS

>

-- 




Jens Rantil

Backend Developer @ Tink




Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden

For urgent matters you can reach me at +46-708-84 18 32.

Re: Kafka usecase

2016-09-18 Thread kant kodali

Why does comcast needs to do better than 1-2 seconds?






On Sun, Sep 18, 2016 8:08 PM, Ghosh, Achintya (Contractor) 
achintya_gh...@comcast.com

wrote:
Hi there,




We have an usecase where we do a lot of business logic to process each message
and sometime it takes 1-2 sec, so will be Kafka fit in our usecase?




Thanks

Achintya

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-17 Thread kant kodali
Still it should be possible to implement using reactive streams right.
Could you please enlighten me on what are the some major differences you see
between a commit log and a message queue? I see them being different only in the
implementation but not functionality wise so I would be glad to hear your
thoughts.
 





On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com
wrote:
Kafka is not a queue. It’s a distributed commit log.




–

Best regards,

Radek Gruchalski

ra...@gruchalski.com







On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) wrote:




Hmm...Looks like Kafka is written in Scala. There is this thing called

reactive

streams where a slow consumer can apply back pressure if they are consuming

slow. Even with Java this is possible with a Library called RxJava and

these

ideas will be incorporated in Java 9 as well.

I still don't see why they would pick poll just to solve this one problem

and

compensating on others. Poll just don't sound realtime. I heard from some

people

that they would set poll to 100ms. Well 1) that is a lot of time. 2)

Financial

applications requires micro second latency. Kafka from what I understand

looks

like has a very high latency and here is the article.

http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by

articles but I ran my own experiments on different queues and my numbers

are

very close to this article so I would say whoever wrote this article has

done a

good Job. 3) poll does generate unnecessary traffic in case if the data

isn't

available.

Finally still not sure why they would pick poll() ? or do they plan on

introducing reactive streams?Thanks,kant



















On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com

wrote:

I'm only guessing here regarding if this is the reason:




Pull is much more sensible when a lot of data is pushed through. It allows

consumers consuming at their own pace, slow consumers do not slow the

complete

system down.













-- 




Best regards,




Rad








































On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" <kanth...@gmail.com>

wrote:






























































































why did Kafka choose pull instead of push for a consumer? push sounds like

it




is more realtime to me than poll and also wouldn't poll just keeps polling

even




when they are no messages in the broker causing more traffic? please

enlighten




me

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-17 Thread kant kodali
Hmm...Looks like Kafka is written in Scala. There is this thing called reactive
streams where a slow consumer can apply back pressure if they are consuming
slow. Even with Java this is possible with a Library called RxJava and these
ideas will be incorporated in Java 9 as well.
I still don't see why they would pick poll just to solve this one problem and
compensating on others. Poll just don't sound realtime. I heard from some people
that they would set poll to 100ms. Well 1) that is a lot of time. 2) Financial
applications requires micro second latency. Kafka from what I understand looks
like has a very high latency and here is the article.
http://bravenewgeek.com/dissecting-message-queues/ I usually  don't go by
articles but I ran my own experiments on different queues and my numbers are
very close to this article so I would say whoever wrote this article has done a
good Job. 3) poll does generate unnecessary traffic in case if the data isn't
available.
Finally still not sure why they would pick poll() ? or do they plan on
introducing reactive streams?Thanks,kant
 





On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com
wrote:
I'm only guessing here regarding if this is the reason:

Pull is much more sensible when a lot of data is pushed through. It allows
consumers consuming at their own pace, slow consumers do not slow the complete
system down.




-- 

Best regards,

Rad













On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" <kanth...@gmail.com>
wrote:































why did Kafka choose pull instead of push for a consumer? push sounds like it

is more realtime to me than poll and also wouldn't poll just keeps polling even

when they are no messages in the broker causing more traffic? please enlighten

me

why did Kafka choose pull instead of push for a consumer ?

2016-09-17 Thread kant kodali

why did Kafka choose pull instead of push for a consumer? push sounds like it
is more realtime to me than poll and also wouldn't poll just keeps polling even
when they are no messages in the broker causing more traffic? please enlighten
me

Re: can one topic be registered in multiple brokers?

2016-09-17 Thread kant kodali
so Zookeeper will select which broker it should direct the message to if I have
3 brokers and 3 partitions of a topic?
I only finished the benchmarks of Kafka and NSQ and still working NATS Streaming
Server (one more day I will finish it). But so far Kafka had a great throughput
with Java Client about 55000 messages per second(receive throughput) and 70K
messages per second(sendthrougput)each message size is 1KB. so yeah something
wrong with node.js client
NSQ was ~ 3K receive throughput and~4K send throughput
one thing about my NSQ benchmark is first of all I am new to Go and I have the
following code (pasting only the lines that are relevant). I just dont know if
w.Publish  is a blocking call? If it is a blocking call I have to see if the
library offers any non-blocking async call since the java version I wrote for
Kafka was async. That way I am comparing apples to apples
w, _ := nsq.NewProducer("172.31.18.175:4150", config) for i := 0; i<50; i++
{    err := w.Publish("nsq_go_test", []byte(data))    if err != nil {     
log.Panic("Could not connect")    }  }
>From what I heard beating NATS is going to be harder but I rather run my own
experiment than going with what I heard. Also it






On Sat, Sep 17, 2016 1:38 AM, Ali Akhtar ali.rac...@gmail.com
wrote:
You can create multiple partitions of a topic and kafka will attempt to

distribute them evenly.




E.g if you have 3 brokers and you create 3 partitions of a topic, each

broker will be the leader of 1 of the 3 partitions.




P.S how did the benchmarking go?




On Sat, Sep 17, 2016 at 1:36 PM, kant kodali <kanth...@gmail.com> wrote:




> can one topic be registered in multiple brokers? if so, which component of

> kafka decides which broker should get the message for that particular

> topic?

> Thanks!

can one topic be registered in multiple brokers?

2016-09-17 Thread kant kodali
can one topic be registered in multiple brokers? if so, which component of
kafka decides which broker should get the message for that particular topic?
Thanks!

any update on this?

2016-09-17 Thread kant kodali

https://issues.apache.org/jira/browse/KAFKA-1793
It would be great to use Consul instead of Zookeeper for Kafka and I think it
would benefit Kafka a lot from the exponentially growing consul community.

Re: which port should I use 9091 or 9092 or 2181 to send messages through kafka when using a client Library?

2016-09-15 Thread kant kodali
@Umesh According to these examples it looks like producer and consumer
specifies bootstrap.servers. What is PLAINTEXT? do I need to change something
here https://github.com/apache/kafka/blob/trunk/config/server.properties ?
because when I specify port 9092 for both producer consumer or just either of
them it doesn't seem to work. Only when I specify zookeeper port it seems to
work and I don't know why?

https://github.com/apache/kafka/blob/trunk/examples/src/main/java/kafka/examples/Producer.java#L34
https://github.com/apache/kafka/blob/trunk/examples/src/main/java/kafka/examples/Consumer.java

 





On Thu, Sep 15, 2016 8:15 AM, UMESH CHAUDHARY umesh9...@gmail.com
wrote:
No that is not required, when you use new consumer API. You have to

specify bootstrap.servers,

which will have 9092 (for PLAINTEXT usually ).

In old consumer API you need zookeeper server which points on 2181.




On Thu, 15 Sep 2016 at 17:03 kant kodali <kanth...@gmail.com> wrote:




> I haven't changed anything from

> https://github.com/apache/kafka/blob/trunk/config/server.properties

> and it looks like it is pointing to zookeeper.

> Question:

> Does producer client need to point 9092 and Consumer need to point 2181?

> is that

> the standard? Why not both point to the same thing?

>

>

>

>

>

>

> On Thu, Sep 15, 2016 4:24 AM, Ali Akhtar ali.rac...@gmail.com

> wrote:

> Examine server.properties and see which port you're using in there

>

>

>

>

> On Thu, Sep 15, 2016 at 3:52 PM, kant kodali <kanth...@gmail.com> wrote:

>

>

>

>

> > which port should I use 9091 or 9092 or 2181 to send messages through

> kafka

>

> > when using a client Library?

>

> > I start kafka as follows:

>

> > sudo bin/zookeeper-server-start.sh config/zookeeper.propertiessudo

>

> > ./bin/kafka-server-start.sh config/server.properties

>

> >

>

> > and I dont see any process running on 9091 or 9092 however lot of client

>

> > library

>

> > examples have a consumer client pointing to 9092. for example here

>

> > https://github.com/apache/kafka/blob/trunk/examples/src/main

>

> > /java/kafka/examples/Producer.java#L34

>

> > shouldn't both producer and consumer point to zookeeper port 2181? which

> I

>

> > am

>

> > assuming will do the lookup?

>

> > Thanks,Kant

Re: which port should I use 9091 or 9092 or 2181 to send messages through kafka when using a client Library?

2016-09-15 Thread kant kodali

I haven't changed anything from
https://github.com/apache/kafka/blob/trunk/config/server.properties
and it looks like it is pointing to zookeeper.
Question:
Does producer client need to point 9092 and Consumer need to point 2181? is that
the standard? Why not both point to the same thing?






On Thu, Sep 15, 2016 4:24 AM, Ali Akhtar ali.rac...@gmail.com
wrote:
Examine server.properties and see which port you're using in there




On Thu, Sep 15, 2016 at 3:52 PM, kant kodali <kanth...@gmail.com> wrote:





which port should I use 9091 or 9092 or 2181 to send messages through kafka



when using a client Library?



I start kafka as follows:



sudo bin/zookeeper-server-start.sh config/zookeeper.propertiessudo



./bin/kafka-server-start.sh config/server.properties







and I dont see any process running on 9091 or 9092 however lot of client



library



examples have a consumer client pointing to 9092. for example here



https://github.com/apache/kafka/blob/trunk/examples/src/main



/java/kafka/examples/Producer.java#L34



shouldn't both producer and consumer point to zookeeper port 2181? which I



am



assuming will do the lookup?



Thanks,Kant

which port should I use 9091 or 9092 or 2181 to send messages through kafka when using a client Library?

2016-09-15 Thread kant kodali

which port should I use 9091 or 9092 or 2181 to send messages through kafka
when using a client Library?
I start kafka as follows:
sudo bin/zookeeper-server-start.sh config/zookeeper.propertiessudo
./bin/kafka-server-start.sh config/server.properties

and I dont see any process running on 9091 or 9092 however lot of client library
examples have a consumer client pointing to 9092. for example here
https://github.com/apache/kafka/blob/trunk/examples/src/main/java/kafka/examples/Producer.java#L34
shouldn't both producer and consumer point to zookeeper port 2181? which I am
assuming will do the lookup?
Thanks,Kant

Re: What is the fair setup of Kafka to be comparable with NATS or NSQ?

2016-09-15 Thread kant kodali

Hi Ben,
Also the article that you pointed out clearly shows the setup had multiple
partitions and multiple workers and so on..that's whyat this point my biggest
question is what is the fair setup for Kafka so its comparable with NATS and
NSQ? and since you suspect the client Library I can give that a go..but can you
please confirm that one partition on one broker should be able to handle 300K
messages of 1KB data size for each message?
Thanks,kant






On Thu, Sep 15, 2016 2:28 AM, kant kodali kanth...@gmail.com
wrote:
Hi Ben,
I can give that a try but can you tell me the suspicion or motivation behind it?
other words you think single partition and single broker should be comparable to
the setup I had with NATS and NSQ except you suspect the client library or
something?
Thanks,Kant






On Thu, Sep 15, 2016 2:16 AM, Ben Davison ben.davi...@7digital.com
wrote:
Hi Kant,




I was following the other thread, can you try using a different

benchmarking client for a test.




https://grey-boundary.io/load-testing-apache-kafka-on-aws/




Ben




On Thursday, 15 September 2016, kant kodali <kanth...@gmail.com> wrote:





with Kafka I tried it with 10 messages with single broker and only one



partiton



that looked instantaneous and ~5K messages/sec for the data size of 1KB



I tried it with 1000 messages that looked instantaneous as well ~5K



messages/sec



for the data size of 1KBI tried it with 10K messages with single broker



and only



one partiton things started to go down ~1K messages/sec for the data size



of



1KB



having only one partition on a single broker is a bad? My goal is to run



some



basic benchmarks on NATS & NSQ & KAFKA



I have the same environment for all three (NATS & NSQ & KAFKA)



a broker on Machine 1producer on Machine 2Consumer on Machine 3



with a data size of 1KB (so each message is 1KB ) and m4.xlarge aws



instance.



I have pushed 300K messages with NATS and it was able to handle easily and



receive throughput was 5K messages/secI have pushed 300K messages and NSQ



and



receive throughput was 2K messages/secI am unable to push 300K messages



with



Kafka with the above configuration and environmentso at this point my



biggest



question is what is the fair setup for Kafka so its comparable with NATS



and



NSQ?



kant





--







This email, including attachments, is private and confidential. If you have 

received this email in error please notify the sender and delete it from 

your system. Emails are not secure and may contain viruses. No liability 

can be accepted for viruses that might be transferred by this email or any 

attachment. Any unauthorised copying of this message or unauthorised 

distribution and publication of the information contained herein are 


prohibited.




7digital Limited. Registered office: 69 Wilson Street, London EC2A 2BB.

Registered in England and Wales. Registered No. 04843573.

Re: What is the fair setup of Kafka to be comparable with NATS or NSQ?

2016-09-15 Thread kant kodali

Hi Ben,
I can give that a try but can you tell me the suspicion or motivation behind it?
other words you think single partition and single broker should be comparable to
the setup I had with NATS and NSQ except you suspect the client library or
something?
Thanks,Kant






On Thu, Sep 15, 2016 2:16 AM, Ben Davison ben.davi...@7digital.com
wrote:
Hi Kant,




I was following the other thread, can you try using a different

benchmarking client for a test.




https://grey-boundary.io/load-testing-apache-kafka-on-aws/




Ben




On Thursday, 15 September 2016, kant kodali <kanth...@gmail.com> wrote:





with Kafka I tried it with 10 messages with single broker and only one



partiton



that looked instantaneous and ~5K messages/sec for the data size of 1KB



I tried it with 1000 messages that looked instantaneous as well ~5K



messages/sec



for the data size of 1KBI tried it with 10K messages with single broker



and only



one partiton things started to go down ~1K messages/sec for the data size



of



1KB



having only one partition on a single broker is a bad? My goal is to run



some



basic benchmarks on NATS & NSQ & KAFKA



I have the same environment for all three (NATS & NSQ & KAFKA)



a broker on Machine 1producer on Machine 2Consumer on Machine 3



with a data size of 1KB (so each message is 1KB ) and m4.xlarge aws



instance.



I have pushed 300K messages with NATS and it was able to handle easily and



receive throughput was 5K messages/secI have pushed 300K messages and NSQ



and



receive throughput was 2K messages/secI am unable to push 300K messages



with



Kafka with the above configuration and environmentso at this point my



biggest



question is what is the fair setup for Kafka so its comparable with NATS



and



NSQ?



kant





--







This email, including attachments, is private and confidential. If you have 

received this email in error please notify the sender and delete it from 

your system. Emails are not secure and may contain viruses. No liability 

can be accepted for viruses that might be transferred by this email or any 

attachment. Any unauthorised copying of this message or unauthorised 

distribution and publication of the information contained herein are 


prohibited.




7digital Limited. Registered office: 69 Wilson Street, London EC2A 2BB.

Registered in England and Wales. Registered No. 04843573.

What is the fair setup of Kafka to be comparable with NATS or NSQ?

2016-09-15 Thread kant kodali
with Kafka I tried it with 10 messages with single broker and only one partiton
that looked instantaneous and ~5K messages/sec for the data size of 1KB
I tried it with 1000 messages that looked instantaneous as well ~5K messages/sec
for the data size of 1KBI tried it with 10K messages with single broker and only
one partiton  things started to go down ~1K messages/sec for the data size of
1KB
having only one partition on a single broker is a bad?  My goal is to run some
basic benchmarks on NATS & NSQ & KAFKA
I have the same environment for all three (NATS & NSQ & KAFKA)
a broker  on Machine 1producer on Machine 2Consumer on Machine 3
with a data size of 1KB (so each message is 1KB ) and m4.xlarge aws instance.
I have pushed 300K messages with NATS and it was able to handle easily and
receive throughput was 5K messages/secI have pushed 300K messages and NSQ and
receive throughput was 2K messages/secI am unable to push 300K messages with
Kafka with the above configuration and environmentso at this point my biggest
question is what is the fair setup for Kafka so its comparable with NATS and
NSQ?
kant

Re: hi

2016-09-15 Thread kant kodali

I used node.js client libraries for all three and yes I want to make sure I am
comparing apples to apples so I make it as equivalent as possible.
Again the big question is What is the right setup for Kafka to be comparable
with the other I mentioned in my previous email?






On Thu, Sep 15, 2016 1:47 AM, Ali Akhtar ali.rac...@gmail.com
wrote:
The issue is clearly that you're running out of resources, so I would add

more brokers and/or larger instances.




You're also using Node which is not the best for performance. A compiled

language such as Java would give you the best performance.




Here's a case study that should help:

https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines




Good luck, let us know how it goes




On Thu, Sep 15, 2016 at 1:42 PM, kant kodali <kanth...@gmail.com> wrote:





yeah..



I tried it with 10 messages with single broker and only one partiton that



looked



instantaneous and ~5K messages/sec for the data size of 1KBI tried it with



1000



messages that looked instantaneous as well ~5K messages/sec for the data



size of



1KBI tried it with 10K messages with single broker and only one



partiton things



started to go down ~1K messages/sec for the data size of 1KB



having only one partition on a single broker is a bad? My goal is to run



some



basic benchmarks on NATS & NSQ & KAFKA



I have the same environment for all three (NATS & NSQ & KAFKA)



a broker on Machine 1producer on Machine 2Consumer on Machine 3



with a data size of 1KB (so each message is 1KB ) and m4.xlarge aws



instance.



I have pushed 300K messages with NATS and it was able to handle easily and



receive throughput was 5K messages/secI have pushed 300K messages and NSQ



and



receive throughput was 2K messages/secI am unable to push 300K messages



with



Kafka with the above configuration and environment so at this point my



biggest



question is what is the fair setup for Kafka so its comparable with NATS



and



NSQ?



kant



























On Thu, Sep 15, 2016 12:43 AM, Ali Akhtar ali.rac...@gmail.com



wrote:



Lower the workload gradually, start from 10 messages, increase to 100, then







1000, and so on. See if it slows down as the workload increases. If so, you







need more brokers + partitions to handle the workload.



















On Thu, Sep 15, 2016 at 12:42 PM, kant kodali <kanth...@gmail.com> wrote:



















> m4.xlarge







>







>







>







>







>







>







> On Thu, Sep 15, 2016 12:33 AM, Ali Akhtar ali.rac...@gmail.com







>







> wrote:







> What's the instance size that you're using? With 300k messages your



single







>







> broker might not be able to handle it.







>







>







>







>







> On Thu, Sep 15, 2016 at 12:30 PM, kant kodali <kanth...@gmail.com>



wrote:







>







>







>







>







> My goal is to test the throughput (#messages per second) given my setup



and







>>







>







> with a data size of 1KB. if you guys already have some idea on these







>>







>







> numbers







>>







>







> that would be helpful as well.







>>







>







>







>>







>







>>







>







>>







>







>>







>







>>







>







>>







> On Thu, Sep 15, 2016 12:24 AM, kant kodali kanth...@gmail.com







>>







>







> wrote:







>>







>







> 172.* is all private ip's for my machine I double checked it.I have not







>>







>







> changed







>>







>







> any default settingsI dont know how to use kafka-consumer.sh







>>







>







> or kafka-producer.sh because it looks like they want me to specify a



group







>>







>







> and I







>>







>







> didn't create any consumer group because I am using single producer and







>>







>







> consumer. is there a default group?Also, I am receiving message but very







>>







>







> late. I







>>







>







> send about 300K messages using the node.js client and I am receiving at a







>>







>







> very







>>







>







> low rate. really not sure what is going on?







>>







>







>







>>







>







>>







>







>>







>







>>







>







>>







>







>>







> On Thu, Sep 15, 2016 12:06 AM, Ali Akhtar ali.rac...@gmail.com







>>







>







> wrote:







>>







>







&g

Re: hi

2016-09-15 Thread kant kodali
yeah..
I tried it with 10 messages with single broker and only one partiton that looked
instantaneous and ~5K messages/sec for the data size of 1KBI tried it with 1000
messages that looked instantaneous as well ~5K messages/sec for the data size of
1KBI tried it with 10K messages with single broker and only one partiton  things
started to go down ~1K messages/sec for the data size of 1KB
having only one partition on a single broker is a bad?  My goal is to run some
basic benchmarks on NATS & NSQ & KAFKA
I have the same environment for all three (NATS & NSQ & KAFKA)
a broker  on Machine 1producer on Machine 2Consumer on Machine 3
with a data size of 1KB (so each message is 1KB ) and m4.xlarge aws instance.
I have pushed 300K messages with NATS and it was able to handle easily and
receive throughput was 5K messages/secI have pushed 300K messages and NSQ and
receive throughput was 2K messages/secI am unable to push 300K messages with
Kafka with the above configuration and environment so at this point my biggest
question is what is the fair setup for Kafka so its comparable with NATS and
NSQ?
kant
 





On Thu, Sep 15, 2016 12:43 AM, Ali Akhtar ali.rac...@gmail.com
wrote:
Lower the workload gradually, start from 10 messages, increase to 100, then

1000, and so on. See if it slows down as the workload increases. If so, you

need more brokers + partitions to handle the workload.




On Thu, Sep 15, 2016 at 12:42 PM, kant kodali <kanth...@gmail.com> wrote:




> m4.xlarge

>

>

>

>

>

>

> On Thu, Sep 15, 2016 12:33 AM, Ali Akhtar ali.rac...@gmail.com

>

> wrote:

> What's the instance size that you're using? With 300k messages your single

>

> broker might not be able to handle it.

>

>

>

>

> On Thu, Sep 15, 2016 at 12:30 PM, kant kodali <kanth...@gmail.com> wrote:

>

>

>

>

> My goal is to test the throughput (#messages per second) given my setup and

>>

>

> with a data size of 1KB. if you guys already have some idea on these

>>

>

> numbers

>>

>

> that would be helpful as well.

>>

>

>

>>

>

>>

>

>>

>

>>

>

>>

>

>>

> On Thu, Sep 15, 2016 12:24 AM, kant kodali kanth...@gmail.com

>>

>

> wrote:

>>

>

> 172.* is all private ip's for my machine I double checked it.I have not

>>

>

> changed

>>

>

> any default settingsI dont know how to use kafka-consumer.sh

>>

>

> or kafka-producer.sh because it looks like they want me to specify a group

>>

>

> and I

>>

>

> didn't create any consumer group because I am using single producer and

>>

>

> consumer. is there a default group?Also, I am receiving message but very

>>

>

> late. I

>>

>

> send about 300K messages using the node.js client and I am receiving at a

>>

>

> very

>>

>

> low rate. really not sure what is going on?

>>

>

>

>>

>

>>

>

>>

>

>>

>

>>

>

>>

> On Thu, Sep 15, 2016 12:06 AM, Ali Akhtar ali.rac...@gmail.com

>>

>

> wrote:

>>

>

> Your code seems to be using the public ip of the servers. If all 3 machines

>>

>

>

>>

> are in the same availability zone on AWS, try using the private ip, and

>>

>

>

>>

> then they might communicate over the local network.

>>

>

>

>>

>

>>

>

>>

>

>>

> Did you change any default settings?

>>

>

>

>>

>

>>

>

>>

>

>>

> Do you get the same results if you run kafka-consumer.sh and

>>

>

>

>>

> kafka-producer.sh instead of the Node code?

>>

>

>

>>

>

>>

>

>>

>

>>

> On Thu, Sep 15, 2016 at 12:01 PM, kant kodali <kanth...@gmail.com> wrote:

>>

>

>

>>

>

>>

>

>>

>

>>

> > They are hosted on AWS and I dont think there are any network issues

>>

>

>

>>

> > because I

>>

>

>

>>

> > tried testing other Queuing systems with no issues however I am using a

>>

>

>

>>

> > node.js

>>

>

>

>>

> > client with the following code. I am not sure if there are any errors or

>>

>

>

>>

> > anything I didn't set in the following code?

>>

>

>

>>

> >

>>

>

>

>>

> >

>>

>

>

>>

> > //producer var kafka = require('kafka-node'); var

>>

>

>

>>

> > Producer = kafka.Producer; var Client = kafka.Client;

Re: hi

2016-09-15 Thread kant kodali

m4.xlarge






On Thu, Sep 15, 2016 12:33 AM, Ali Akhtar ali.rac...@gmail.com
wrote:
What's the instance size that you're using? With 300k messages your single

broker might not be able to handle it.




On Thu, Sep 15, 2016 at 12:30 PM, kant kodali <kanth...@gmail.com> wrote:





My goal is to test the throughput (#messages per second) given my setup and



with a data size of 1KB. if you guys already have some idea on these



numbers



that would be helpful as well.



























On Thu, Sep 15, 2016 12:24 AM, kant kodali kanth...@gmail.com



wrote:



172.* is all private ip's for my machine I double checked it.I have not



changed



any default settingsI dont know how to use kafka-consumer.sh



or kafka-producer.sh because it looks like they want me to specify a group



and I



didn't create any consumer group because I am using single producer and



consumer. is there a default group?Also, I am receiving message but very



late. I



send about 300K messages using the node.js client and I am receiving at a



very



low rate. really not sure what is going on?



























On Thu, Sep 15, 2016 12:06 AM, Ali Akhtar ali.rac...@gmail.com



wrote:



Your code seems to be using the public ip of the servers. If all 3 machines







are in the same availability zone on AWS, try using the private ip, and







then they might communicate over the local network.



















Did you change any default settings?



















Do you get the same results if you run kafka-consumer.sh and







kafka-producer.sh instead of the Node code?



















On Thu, Sep 15, 2016 at 12:01 PM, kant kodali <kanth...@gmail.com> wrote:



















> They are hosted on AWS and I dont think there are any network issues







> because I







> tried testing other Queuing systems with no issues however I am using a







> node.js







> client with the following code. I am not sure if there are any errors or







> anything I didn't set in the following code?







>







>







> //producer var kafka = require('kafka-node'); var







> Producer = kafka.Producer; var Client = kafka.Client; var client =







> new Client('172.31.21.175:2181'); var argv =







> require('optimist').argv; var topic = argv.topic || 'kafka_test'; var







> p = argv.p || 0; var a = argv.a || 0; var producer = new







> Producer(client, { requireAcks: 1}); var num = 35;







> producer.on('ready', function () { var message = 'Hello World';







> for (var i=0; i<num; i++) { producer.send([ { topic:







> topic, partition: p, messages: message, attributes: a } ], function







> (err, result) { console.log(err || result);







> //process.exit(); }); } }); producer.on('error',







> function (err) { console.log('error', err); process.exit();







> }); //Consumer var kafka = require('kafka-node'); var Consumer =







> kafka.Consumer; var Offset = kafka.Offset; var Client =







> kafka.Client; var argv = require('optimist').argv; var topic =







> argv.topic || 'kafka_test'; var client = new







> Client('172.31.21.175:2181'); var topics = [ {topic: topic,







> partition: 0} ]; var options = { autoCommit: false, fetchMaxWaitMs:







> 1000 }; var consumer = new Consumer(client, topics, options); var







> offset = new Offset(client); var start; var received = 0; var







> target = 20; var hash = 1000; consumer.on('message', function







> (message) { console.log(message); received += 1; if







> (received === 1) { start = new Date(); } if (received === target) {







> var stop = new Date(); console.log('\nDone test');







> var mps = parseInt(target/((stop-start)/1000));







> console.log('Received at ' + mps + ' msgs/sec'); process.exit();







> } else if (received % hash === 0){







> process.stdout.write(received + '\n'); } });







> consumer.on('error', function (err) { console.log('error', err); });







>







> Not using Mixmax yet?







>







>







>







>







>







>







>







> On Wed, Sep 14, 2016 11:58 PM, Ali Akhtar ali.rac...@gmail.com







> wrote:







> It sounds like a network issue. Where are the 3 servers located / hosted?







>







>







>







>







> On Thu, Sep 15, 2016 at 11:51 AM, kant kodali <kanth...@gmail.com>



wrote:







>







>







>







>







> Hi,







>>







>







> I have the following setup.







>>







>







> Single Kafka broker and Zookeeper on Machine 1single Kafka producer on







>>







>







> Machine 2







>>







>







> Single Kafka Consumer on Machine 3







>>







&

Re: hi

2016-09-15 Thread kant kodali
My goal is to test the throughput (#messages per second) given my setup and
with a data size of 1KB. if you guys already have some idea on these numbers
that would be helpful as well.
 





On Thu, Sep 15, 2016 12:24 AM, kant kodali kanth...@gmail.com
wrote:
172.* is all private ip's for my machine I double checked it.I have not changed
any default settingsI dont know how to use  kafka-consumer.sh
or kafka-producer.sh because it looks like they want me to specify a group and I
didn't create any  consumer group because I am using single producer and
consumer. is there a default group?Also, I am receiving message but very late. I
send about 300K messages using the node.js client and I am receiving at a very
low rate. really not sure what is going on?
 





On Thu, Sep 15, 2016 12:06 AM, Ali Akhtar ali.rac...@gmail.com
wrote:
Your code seems to be using the public ip of the servers. If all 3 machines

are in the same availability zone on AWS, try using the private ip, and

then they might communicate over the local network.




Did you change any default settings?




Do you get the same results if you run kafka-consumer.sh and

kafka-producer.sh instead of the Node code?




On Thu, Sep 15, 2016 at 12:01 PM, kant kodali <kanth...@gmail.com> wrote:




> They are hosted on AWS and I dont think there are any network issues

> because I

> tried testing other Queuing systems with no issues however I am using a

> node.js

> client with the following code. I am not sure if there are any errors or

> anything I didn't set in the following code?

>

>

> //producer var kafka = require('kafka-node'); var

> Producer = kafka.Producer; var Client = kafka.Client; var client =

> new Client('172.31.21.175:2181'); var argv =

> require('optimist').argv; var topic = argv.topic || 'kafka_test'; var

> p = argv.p || 0; var a = argv.a || 0; var producer = new

> Producer(client, { requireAcks: 1}); var num = 35;

> producer.on('ready', function () { var message = 'Hello World';

> for (var i=0; i<num; i++) { producer.send([ { topic:

> topic, partition: p, messages: message, attributes: a } ], function

> (err, result) { console.log(err || result);

> //process.exit(); }); } }); producer.on('error',

> function (err) { console.log('error', err); process.exit();

> }); //Consumer var kafka = require('kafka-node'); var Consumer =

> kafka.Consumer; var Offset = kafka.Offset; var Client =

> kafka.Client; var argv = require('optimist').argv; var topic =

> argv.topic || 'kafka_test'; var client = new

> Client('172.31.21.175:2181'); var topics = [ {topic: topic,

> partition: 0} ]; var options = { autoCommit: false, fetchMaxWaitMs:

> 1000 }; var consumer = new Consumer(client, topics, options); var

> offset = new Offset(client); var start; var received = 0; var

> target = 20; var hash = 1000; consumer.on('message', function

> (message) { console.log(message); received += 1; if

> (received === 1) { start = new Date(); } if (received === target) {

> var stop = new Date(); console.log('\nDone test');

> var mps = parseInt(target/((stop-start)/1000));

> console.log('Received at ' + mps + ' msgs/sec'); process.exit();

> } else if (received % hash === 0){

> process.stdout.write(received + '\n'); } });

> consumer.on('error', function (err) { console.log('error', err); });

>

> Not using Mixmax yet?

>

>

>

>

>

>

>

> On Wed, Sep 14, 2016 11:58 PM, Ali Akhtar ali.rac...@gmail.com

> wrote:

> It sounds like a network issue. Where are the 3 servers located / hosted?

>

>

>

>

> On Thu, Sep 15, 2016 at 11:51 AM, kant kodali <kanth...@gmail.com> wrote:

>

>

>

>

> Hi,

>>

>

> I have the following setup.

>>

>

> Single Kafka broker and Zookeeper on Machine 1single Kafka producer on

>>

>

> Machine 2

>>

>

> Single Kafka Consumer on Machine 3

>>

>

> When a producer client sends a message to the Kafka broker by pointing at

>>

>

> the

>>

>

> Zookeeper Server the consumer doesn't seem to get the message right away

>>

>

> instead

>>

>

> it gets after a minute or something (pretty late). I am not sure what

>>

>

> settings I

>>

>

> need to change. any ideas?

>>

>

> Thanks,kant

>

>

Re: hi

2016-09-15 Thread kant kodali
172.* is all private ip's for my machine I double checked it.I have not changed
any default settingsI dont know how to use  kafka-consumer.sh
or kafka-producer.sh because it looks like they want me to specify a group and I
didn't create any  consumer group because I am using single producer and
consumer. is there a default group?Also, I am receiving message but very late. I
send about 300K messages using the node.js client and I am receiving at a very
low rate. really not sure what is going on?
 





On Thu, Sep 15, 2016 12:06 AM, Ali Akhtar ali.rac...@gmail.com
wrote:
Your code seems to be using the public ip of the servers. If all 3 machines

are in the same availability zone on AWS, try using the private ip, and

then they might communicate over the local network.




Did you change any default settings?




Do you get the same results if you run kafka-consumer.sh and

kafka-producer.sh instead of the Node code?




On Thu, Sep 15, 2016 at 12:01 PM, kant kodali <kanth...@gmail.com> wrote:




> They are hosted on AWS and I dont think there are any network issues

> because I

> tried testing other Queuing systems with no issues however I am using a

> node.js

> client with the following code. I am not sure if there are any errors or

> anything I didn't set in the following code?

>

>

> //producer var kafka = require('kafka-node'); var

> Producer = kafka.Producer; var Client = kafka.Client; var client =

> new Client('172.31.21.175:2181'); var argv =

> require('optimist').argv; var topic = argv.topic || 'kafka_test'; var

> p = argv.p || 0; var a = argv.a || 0; var producer = new

> Producer(client, { requireAcks: 1}); var num = 35;

> producer.on('ready', function () { var message = 'Hello World';

> for (var i=0; i<num; i++) { producer.send([ { topic:

> topic, partition: p, messages: message, attributes: a } ], function

> (err, result) { console.log(err || result);

> //process.exit(); }); } }); producer.on('error',

> function (err) { console.log('error', err); process.exit();

> }); //Consumer var kafka = require('kafka-node'); var Consumer =

> kafka.Consumer; var Offset = kafka.Offset; var Client =

> kafka.Client; var argv = require('optimist').argv; var topic =

> argv.topic || 'kafka_test'; var client = new

> Client('172.31.21.175:2181'); var topics = [ {topic: topic,

> partition: 0} ]; var options = { autoCommit: false, fetchMaxWaitMs:

> 1000 }; var consumer = new Consumer(client, topics, options); var

> offset = new Offset(client); var start; var received = 0; var

> target = 20; var hash = 1000; consumer.on('message', function

> (message) { console.log(message); received += 1; if

> (received === 1) { start = new Date(); } if (received === target) {

> var stop = new Date(); console.log('\nDone test');

> var mps = parseInt(target/((stop-start)/1000));

> console.log('Received at ' + mps + ' msgs/sec'); process.exit();

> } else if (received % hash === 0){

> process.stdout.write(received + '\n'); } });

> consumer.on('error', function (err) { console.log('error', err); });

>

> Not using Mixmax yet?

>

>

>

>

>

>

>

> On Wed, Sep 14, 2016 11:58 PM, Ali Akhtar ali.rac...@gmail.com

> wrote:

> It sounds like a network issue. Where are the 3 servers located / hosted?

>

>

>

>

> On Thu, Sep 15, 2016 at 11:51 AM, kant kodali <kanth...@gmail.com> wrote:

>

>

>

>

> Hi,

>>

>

> I have the following setup.

>>

>

> Single Kafka broker and Zookeeper on Machine 1single Kafka producer on

>>

>

> Machine 2

>>

>

> Single Kafka Consumer on Machine 3

>>

>

> When a producer client sends a message to the Kafka broker by pointing at

>>

>

> the

>>

>

> Zookeeper Server the consumer doesn't seem to get the message right away

>>

>

> instead

>>

>

> it gets after a minute or something (pretty late). I am not sure what

>>

>

> settings I

>>

>

> need to change. any ideas?

>>

>

> Thanks,kant

>

>

Re: hi

2016-09-15 Thread kant kodali

They are hosted on AWS and I dont think there are any network issues because I
tried testing other Queuing systems with no issues however I am using a node.js
client with the following code. I am not sure if there are any errors or
anything I didn't set in the following code?


 //producervar kafka = require('kafka-node');var Producer 
= kafka.Producer;var Client = kafka.Client;var client = new 
Client('172.31.21.175:2181');var argv = require('optimist').argv;var topic 
= argv.topic || 'kafka_test';var p = argv.p || 0;var a = argv.a || 0;
var producer = new Producer(client, { requireAcks: 1});var num = 35;
producer.on('ready', function () {  var message = 'Hello World';  for 
(var i=0; i<num; i++) {producer.send([  { topic: topic, 
partition: p, messages: message, attributes: a }  ], function (err, result) {  
 console.log(err || result);   //process.exit(); });  
}});producer.on('error', function (err) {  console.log('error', 
err);  process.exit();});//Consumervar kafka = 
require('kafka-node');var Consumer = kafka.Consumer;var Offset = 
kafka.Offset;var Client = kafka.Client;var argv = 
require('optimist').argv;var topic = argv.topic || 'kafka_test';var 
client = new Client('172.31.21.175:2181');var topics = [{topic: topic, 
partition: 0}];var options = { autoCommit: false, fetchMaxWaitMs: 1000 };  
  var consumer = new Consumer(client, topics, options);var offset = new 
Offset(client);var start;var received = 0;var target = 20;
var hash = 1000;consumer.on('message', function (message) { 
console.log(message);received += 1;if (received === 1) { start = 
new Date(); }if (received === target) {  var stop = new Date();
  console.log('\nDone test');  var mps = 
parseInt(target/((stop-start)/1000));  console.log('Received at ' + mps + 
' msgs/sec');  process.exit();} else if (received % hash === 0){   
   process.stdout.write(received + '\n');}});
consumer.on('error', function (err) {  console.log('error', err);});
   


Not using Mixmax yet?







On Wed, Sep 14, 2016 11:58 PM, Ali Akhtar ali.rac...@gmail.com
wrote:
It sounds like a network issue. Where are the 3 servers located / hosted?




On Thu, Sep 15, 2016 at 11:51 AM, kant kodali <kanth...@gmail.com> wrote:





Hi,



I have the following setup.



Single Kafka broker and Zookeeper on Machine 1single Kafka producer on



Machine 2



Single Kafka Consumer on Machine 3



When a producer client sends a message to the Kafka broker by pointing at



the



Zookeeper Server the consumer doesn't seem to get the message right away



instead



it gets after a minute or something (pretty late). I am not sure what



settings I



need to change. any ideas?



Thanks,kant

hi

2016-09-15 Thread kant kodali

Hi,
I have the following setup.
Single Kafka broker and Zookeeper on Machine 1single Kafka producer on Machine 2
Single Kafka Consumer on Machine 3
When a producer client sends a message to the Kafka broker by pointing at the
Zookeeper Server the consumer doesn't seem to get the message right away instead
it gets after a minute or something (pretty late). I am not sure what settings I
need to change. any ideas?
Thanks,kant

Consumer stops after reaching an offset of 1644

2016-09-14 Thread kant kodali
Hi All,
I am trying to do a simple benchmark test  for Kafka using single broker,
producer and consumer however my consumer doesn't seem to receive all the
messages produced by the producer so not sure what is going on any help?
Here is the full description of the problem.
http://stackoverflow.com/questions/39500780/i-am-trying-to-benchmark-kafka-using-node-js-but-it-stops-working-after-certain

Thanks!kant