GenericRecord.toString produces invalid JSON for logical types
Hi All, I have a serialized avro binary data represented by byte[] where one of the field is long with a logical type of TimeStamp. Timestamp tsp = new Timestamp(1530228588182l); Schema schema = SchemaBuilder.builder() .record("hello") .fields() .name("tsp").type(LogicalTypes.timestampMillis().addToSchema(Schema.create(Schema.Type.LONG))).noDefault() .endRecord(); System.out.println(schema.toString()); GenericRecord genericRecord = new GenericData.Record(schema); genericRecord.put("tsp",tsp.getTime()); I serialized the above generic record to byte[] and used below two methods to deserialize but both of them produce invalid JSON. public static GenericRecord deserialize(final Schema schema, byte[] data) throws IOException { final GenericData genericData = new GenericData(); genericData.addLogicalTypeConversion(new TimeConversions.TimestampConversion()); genericData.addLogicalTypeConversion(new TimeConversions.TimeConversion()); try (final InputStream is = new ByteArrayInputStream(data)) { final Decoder decoder = DecoderFactory.get().binaryDecoder(is, null); final DatumReader reader = new GenericDatumReader<>(schema, schema, genericData); return reader.read(null, decoder); } } This produces {"tsp": 2018-06-28T23:29:48.182Z} this is not a valid json so I also tried the following public static GenericRecord deserialize(final Schema schema, byte[] data) throws IOException { final GenericData genericData = new GenericData(){ @Override public String toString(Object datum) { StringBuilder buffer = new StringBuilder(); // Since these types are not quoted and produce a malformed JSON string, quote it here. if (datum instanceof java.sql.Timestamp || datum instanceof java.sql.Time || datum instanceof java.sql.Date) { return buffer.append("\"").append(datum).append("\"").toString(); } return super.toString(datum); } }; genericData.addLogicalTypeConversion(new TimeConversions.TimestampConversion()); genericData.addLogicalTypeConversion(new TimeConversions.TimeConversion()); try (final InputStream is = new ByteArrayInputStream(data)) { final Decoder decoder = DecoderFactory.get().binaryDecoder(is, null); final DatumReader reader = new GenericDatumReader<>(schema, schema, genericData); return reader.read(null, decoder); } } I still get {"tsp": 2018-06-28T23:29:48.182Z} this is not a valid json Expected output: {"tsp": "2018-06-28T23:29:48.182Z"} Any ideas? Thanks!
Re: Messages are repeating in kafka
@Abhimanyu 1) My guess is that topic offsets will remain for 30 days since that is the configuration you are explicitly setting and Kafka should respect that although I don't know for sure. 2) same as #1 offsets should remain to whatever time you specify. What is the problem with setting offsets.retention.minutes == log.retention.hours ? Did it fix the problem you were facing before? On Tue, May 23, 2017 at 11:40 PM, Abhimanyu Nagrath < abhimanyunagr...@gmail.com> wrote: > Hi Kant, > > After setting this configuration offsets.retention.minutes . I am in doubt > about the two things > 1. If I am deleting a topic will that topic offsets would also get deleted > or will they present for 30 days? > 2. What will happen if for some topics my log.retention.hours = 168 and > offsets.retention.minutes= 1440 * 30 ? > > Regards, > Abhimanyu > > On Mon, May 22, 2017 at 4:09 PM, Abhimanyu Nagrath < > abhimanyunagr...@gmail.com> wrote: > > > @Kant I was going through the offset related configurations before > setting > > offsets.retention.minutes so came accross this configuration and thought > to > > ask whether this should also be tuned or not. > > > > > > Regards, > > Abhimanyu > > > > > > > > > > On Mon, May 22, 2017 at 2:24 PM, kant kodali <kanth...@gmail.com> wrote: > > > >> @Abhimanyu Why do you think you need to set that? Did you try setting > >> offsets.retention.minutes > >> = 1440 * 30 and still seeing duplicates? > >> > >> On Mon, May 22, 2017 at 12:37 AM, Abhimanyu Nagrath < > >> abhimanyunagr...@gmail.com> wrote: > >> > >> > Hi Girish , > >> > > >> > Do I need to tune this configuration offsets.retention.check.interv > >> al.ms > >> > also . Please let me know if I need to tune any other configuration. > >> > > >> > > >> > Regards, > >> > Abhimanyu > >> > > >> > On Sun, May 21, 2017 at 8:01 PM, Girish Aher <girisha...@gmail.com> > >> wrote: > >> > > >> > > Yup, exactly as Kant said. > >> > > Also make sure that the retention of the offsets topic is an upper > >> bound > >> > > across all topics. So in this case, don't create any other topics in > >> the > >> > > future with retention of more than 30 days or otherwise they may > have > >> the > >> > > same problem too. > >> > > > >> > > On May 21, 2017 03:25, "Abhimanyu Nagrath" < > >> abhimanyunagr...@gmail.com> > >> > > wrote: > >> > > > >> > >> Hi Kant, > >> > >> > >> > >> Thanks for the suggestion. > >> > >> > >> > >> > >> > >> Regards, > >> > >> Abhimanyu > >> > >> > >> > >> On Sun, May 21, 2017 at 3:44 PM, kant kodali <kanth...@gmail.com> > >> > wrote: > >> > >> > >> > >>> @Abhimanyu You can try setting offset.retention = 30 > >> (log.retention). > >> > At > >> > >>> most, you will have a storage overhead of 5 million msgs per day * > >> 30 > >> > >>> (days) * 8 bytes (for each offset) = 1.2GB (not that much since > you > >> > have > >> > >>> a > >> > >>> TB of hard disk) > >> > >>> > >> > >>> On Sun, May 21, 2017 at 3:05 AM, kant kodali <kanth...@gmail.com> > >> > wrote: > >> > >>> > >> > >>> > Looking at that ticket and reading the comments it looks like > one > >> of > >> > >>> the > >> > >>> > concern is as follows. > >> > >>> > > >> > >>> > "offsets.retention.minutes is designed to handle the case that a > >> > >>> consumer > >> > >>> > group goes away forever. In that case, we don't want to store > the > >> > >>> offsets > >> > >>> > for that group forever." > >> > >>> > > >> > >>> > This can simply be addressed by setting offset.retention == > >> > >>> log.retention > >> > >>> > by default right? In which case offset wont be stored forever > even > >> > when > >> > >>> > consumer group goes away forever. When the consumer group goes > >> away > >> > >>> forever > >> > >>> > the upper bound to clean up offsets would be equal to > >> log.retention. > >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > On Sun, May 21, 2017 at 2:19 AM, kant kodali < > kanth...@gmail.com> > >> > >>> wrote: > >> > >>> > > >> > >>> >> What is your average message size and network speed? > >> > >>> >> > >> > >>> >> On Sun, May 21, 2017 at 2:04 AM, Abhimanyu Nagrath < > >> > >>> >> abhimanyunagr...@gmail.com> wrote: > >> > >>> >> > >> > >>> >>> Hi Girish, > >> > >>> >>> > >> > >>> >>> I did not set any value for offsets.retention.minutes so > >> therefore > >> > >>> what I > >> > >>> >>> think is picking its default value i.e 1440 minutes so what do > >> you > >> > >>> think > >> > >>> >>> what should I set if I am keeping my data for 30 days? > >> > >>> >>> > >> > >>> >>> Regards, > >> > >>> >>> Abhimanyu > >> > >>> >>> > >> > >>> >> > >> > >>> >> > >> > >>> > > >> > >>> > >> > >> > >> > >> > >> > > >> > > > > >
Re: Messages are repeating in kafka
@Abhimanyu Why do you think you need to set that? Did you try setting offsets.retention.minutes = 1440 * 30 and still seeing duplicates? On Mon, May 22, 2017 at 12:37 AM, Abhimanyu Nagrath < abhimanyunagr...@gmail.com> wrote: > Hi Girish , > > Do I need to tune this configuration offsets.retention.check.interval.ms > also . Please let me know if I need to tune any other configuration. > > > Regards, > Abhimanyu > > On Sun, May 21, 2017 at 8:01 PM, Girish Aher <girisha...@gmail.com> wrote: > > > Yup, exactly as Kant said. > > Also make sure that the retention of the offsets topic is an upper bound > > across all topics. So in this case, don't create any other topics in the > > future with retention of more than 30 days or otherwise they may have the > > same problem too. > > > > On May 21, 2017 03:25, "Abhimanyu Nagrath" <abhimanyunagr...@gmail.com> > > wrote: > > > >> Hi Kant, > >> > >> Thanks for the suggestion. > >> > >> > >> Regards, > >> Abhimanyu > >> > >> On Sun, May 21, 2017 at 3:44 PM, kant kodali <kanth...@gmail.com> > wrote: > >> > >>> @Abhimanyu You can try setting offset.retention = 30 (log.retention). > At > >>> most, you will have a storage overhead of 5 million msgs per day * 30 > >>> (days) * 8 bytes (for each offset) = 1.2GB (not that much since you > have > >>> a > >>> TB of hard disk) > >>> > >>> On Sun, May 21, 2017 at 3:05 AM, kant kodali <kanth...@gmail.com> > wrote: > >>> > >>> > Looking at that ticket and reading the comments it looks like one of > >>> the > >>> > concern is as follows. > >>> > > >>> > "offsets.retention.minutes is designed to handle the case that a > >>> consumer > >>> > group goes away forever. In that case, we don't want to store the > >>> offsets > >>> > for that group forever." > >>> > > >>> > This can simply be addressed by setting offset.retention == > >>> log.retention > >>> > by default right? In which case offset wont be stored forever even > when > >>> > consumer group goes away forever. When the consumer group goes away > >>> forever > >>> > the upper bound to clean up offsets would be equal to log.retention. > >>> > > >>> > > >>> > > >>> > On Sun, May 21, 2017 at 2:19 AM, kant kodali <kanth...@gmail.com> > >>> wrote: > >>> > > >>> >> What is your average message size and network speed? > >>> >> > >>> >> On Sun, May 21, 2017 at 2:04 AM, Abhimanyu Nagrath < > >>> >> abhimanyunagr...@gmail.com> wrote: > >>> >> > >>> >>> Hi Girish, > >>> >>> > >>> >>> I did not set any value for offsets.retention.minutes so therefore > >>> what I > >>> >>> think is picking its default value i.e 1440 minutes so what do you > >>> think > >>> >>> what should I set if I am keeping my data for 30 days? > >>> >>> > >>> >>> Regards, > >>> >>> Abhimanyu > >>> >>> > >>> >> > >>> >> > >>> > > >>> > >> > >> >
Re: Messages are repeating in kafka
@Abhimanyu You can try setting offset.retention = 30 (log.retention). At most, you will have a storage overhead of 5 million msgs per day * 30 (days) * 8 bytes (for each offset) = 1.2GB (not that much since you have a TB of hard disk) On Sun, May 21, 2017 at 3:05 AM, kant kodali <kanth...@gmail.com> wrote: > Looking at that ticket and reading the comments it looks like one of the > concern is as follows. > > "offsets.retention.minutes is designed to handle the case that a consumer > group goes away forever. In that case, we don't want to store the offsets > for that group forever." > > This can simply be addressed by setting offset.retention == log.retention > by default right? In which case offset wont be stored forever even when > consumer group goes away forever. When the consumer group goes away forever > the upper bound to clean up offsets would be equal to log.retention. > > > > On Sun, May 21, 2017 at 2:19 AM, kant kodali <kanth...@gmail.com> wrote: > >> What is your average message size and network speed? >> >> On Sun, May 21, 2017 at 2:04 AM, Abhimanyu Nagrath < >> abhimanyunagr...@gmail.com> wrote: >> >>> Hi Girish, >>> >>> I did not set any value for offsets.retention.minutes so therefore what I >>> think is picking its default value i.e 1440 minutes so what do you think >>> what should I set if I am keeping my data for 30 days? >>> >>> Regards, >>> Abhimanyu >>> >> >> >
Re: Messages are repeating in kafka
Looking at that ticket and reading the comments it looks like one of the concern is as follows. "offsets.retention.minutes is designed to handle the case that a consumer group goes away forever. In that case, we don't want to store the offsets for that group forever." This can simply be addressed by setting offset.retention == log.retention by default right? In which case offset wont be stored forever even when consumer group goes away forever. When the consumer group goes away forever the upper bound to clean up offsets would be equal to log.retention. On Sun, May 21, 2017 at 2:19 AM, kant kodali <kanth...@gmail.com> wrote: > What is your average message size and network speed? > > On Sun, May 21, 2017 at 2:04 AM, Abhimanyu Nagrath < > abhimanyunagr...@gmail.com> wrote: > >> Hi Girish, >> >> I did not set any value for offsets.retention.minutes so therefore what I >> think is picking its default value i.e 1440 minutes so what do you think >> what should I set if I am keeping my data for 30 days? >> >> Regards, >> Abhimanyu >> > >
Re: Messages are repeating in kafka
What is your average message size and network speed? On Sun, May 21, 2017 at 2:04 AM, Abhimanyu Nagrath < abhimanyunagr...@gmail.com> wrote: > Hi Girish, > > I did not set any value for offsets.retention.minutes so therefore what I > think is picking its default value i.e 1440 minutes so what do you think > what should I set if I am keeping my data for 30 days? > > Regards, > Abhimanyu >
Re: is anyone able to create 1M or 10M or 100M or 1B partitions in a topic?
Got it! but do you mean directories or files? I thought partition = file and topic = directory If there are 1 partitions in a topic that means there are 1 files in one directory? On Tue, May 16, 2017 at 2:29 AM, Sameer Kumar <sam.kum.w...@gmail.com> wrote: > I don't think this would be the right approach. from broker side, this > would mean creating 1M/10M/100M/1B directories, this would be too much for > the file system itself. > > For most cases, even some thousand partitions per node should be > sufficient. > > For more details, please refer to > https://www.confluent.io/blog/how-to-choose-the-number-of- > topicspartitions-in-a-kafka-cluster/ > > -Sameer. > > On Tue, May 16, 2017 at 2:40 PM, kant kodali <kanth...@gmail.com> wrote: > > > Forgot to mention: The question in this thread is for one node which has > 8 > > CPU's 16GB RAM & 500GB hard disk space. > > > > On Tue, May 16, 2017 at 2:06 AM, kant kodali <kanth...@gmail.com> wrote: > > > > > Hi All, > > > > > > 1. I was wondering if anyone has seen or heard or able to create 1M or > > 10M > > > or 100M or 1B partitions in a topic? I understand lot of this depends > on > > > filesystem limitations (we are using ext4) and the OS limitations but I > > > just would like to know what is the scale one had seen in production? > > > 2. Is it advisable? > > > > > > Thanks! > > > > > >
Re: is anyone able to create 1M or 10M or 100M or 1B partitions in a topic?
Forgot to mention: The question in this thread is for one node which has 8 CPU's 16GB RAM & 500GB hard disk space. On Tue, May 16, 2017 at 2:06 AM, kant kodali <kanth...@gmail.com> wrote: > Hi All, > > 1. I was wondering if anyone has seen or heard or able to create 1M or 10M > or 100M or 1B partitions in a topic? I understand lot of this depends on > filesystem limitations (we are using ext4) and the OS limitations but I > just would like to know what is the scale one had seen in production? > 2. Is it advisable? > > Thanks! >
is anyone able to create 1M or 10M or 100M or 1B partitions in a topic?
Hi All, 1. I was wondering if anyone has seen or heard or able to create 1M or 10M or 100M or 1B partitions in a topic? I understand lot of this depends on filesystem limitations (we are using ext4) and the OS limitations but I just would like to know what is the scale one had seen in production? 2. Is it advisable? Thanks!
Kafka Streams vs Flink (or any other stream processing framework)
Hi All, I have read enough blogs from Confluent and others and also books that tried to talk about the differences between the two and while it is great to know those differences I hardly find them any useful when it comes to decision making process of which one to pick since I don't see the clear boundaries between Kafka streams vs Flink. To be clear If I ask which one to pick? I am sure the answer will be "depending on your use case". Well that's not what I intend to ask. All the uses cases I came across till today can clearly be implemented using both Kafka Streams or Flink (and I can give the benefit of doubt to someone who probably must have seen better use cases) so the decision to pick one becomes very subjective, biased and just a matter of convenience for example "oh Flink would require additional deployment" or saying "Kafka is meant for micro service architecture" and so on..Even with these examples I don't see why we cannot implement with their counterpart therefore examples like this would serve very little value for an Engineer's thought process. For sure it will be a great sales talk. So, What would be really useful to know is that what is possible with Flink that is not possible or extremely hard to do with Kafka Streams for any use case or Vice-versa ? This way one would clearly know what the boundaries are. If Kafka Streams and Flink are in the same race that is fine too, and it would be good to know that is the case so people can pick whatever they like. Thanks!
Re: Can multiple Kafka consumers read from the same partition of same topic by default?
Hi, Thanks Can you explain little bit whats happening underneath? Does Kafka creates different group.id's by default when group.id's are not set ? When specified a group only one consumer can consumer from the group the for a particular partition right? Thanks! On Thu, Mar 30, 2017 at 9:00 PM, Matthias J. Sax <matth...@confluent.io> wrote: > Yes, you can do that. > > -Matthias > > > > On 3/30/17 6:09 PM, kant kodali wrote: > > Hi All, > > > > Can multiple Kafka consumers read from the same partition of same topic > by > > default? By default, I mean since group.id is not mandatory I am > wondering > > if I spawn multiple kafka consumers without specifying any group.id and > > give them the same topic and partition name will they be able to read > from > > the same partition? I understand that If I give different group names for > > each Kafka consumer then all the consumers can read from the same > partition. > > > > Thanks! > > > >
Can multiple Kafka consumers read from the same partition of same topic by default?
Hi All, Can multiple Kafka consumers read from the same partition of same topic by default? By default, I mean since group.id is not mandatory I am wondering if I spawn multiple kafka consumers without specifying any group.id and give them the same topic and partition name will they be able to read from the same partition? I understand that If I give different group names for each Kafka consumer then all the consumers can read from the same partition. Thanks!
Stack Overflow Questions Jobs Documentation Tags Users Badges Ask Question Does the number of App instances and Zookeeper servers should be the same?
Does the number of App instances and Zookeeper servers should be the same? I understand the requirement of 2F+1 to tolerate F failures but this is to tolerate failures of Zookeeper instances itself. But how about the number of App instances ? For example say I have 3 zookeeper servers and I have 2 instances of my APP running that are managed by zookeeper and at any given time only one instance of my App will be running in master mode and the other in standby mode. Now, I want to be able to tolerate one instance failure of my APP itself (not the zookeeper instance) such that the other instance of my APP which was running in a standby mode should be elected as a new leader. Would that work? or I must have 3 instances of my App and 3 Zookeeper servers? What is the right configuration for number of App instances and Zookeeper servers ? Thanks!
Re: How does Kafka emulate exactly once processing?
Hi Hans, Thats a great answer compared to the paragraphs I read online! I am assuming you meant HDFS? what is JSDC ? Any idea on which is more common for this kind of use case? Also can I store offsets to zookeeper using ZAB instead of using external store? I am not sure how zookeeper stores data but I keep reading you can (perhaps zookeeper requires external storage?). Thanks! On Wed, Dec 21, 2016 at 5:11 PM, Hans Jespersen <h...@confluent.io> wrote: > Exactly once Kafka Sink Connectors typically store the offset externally > in the same atomic write as they store the messages. That way after a > crash, they can check the external store (HSFS, JSDC, etc) retrieve the > last committed offset and seek the the next message and continue processing > with no duplicates and exactly once semantics. > > -hans > > > > > > On Dec 21, 2016, at 4:39 PM, kant kodali <kanth...@gmail.com> wrote: > > > > How does Kafka emulate exactly once processing currently? Does it require > > the producer to send at least once and consumer to de dupe? > > > > I did do my research but I feel like I am going all over the place so a > > simple short answer would be great! > > > > Thanks! > >
How does Kafka emulate exactly once processing?
How does Kafka emulate exactly once processing currently? Does it require the producer to send at least once and consumer to de dupe? I did do my research but I feel like I am going all over the place so a simple short answer would be great! Thanks!
Re: is there a way to make sure two consumers receive the same message from the broker?
:) On Tue, Nov 8, 2016 at 1:37 AM, AmirHossein Roozbahany <band...@outlook.com> wrote: > Excuse me this part was non-sense: if the latest update to a document in > es always win in Cassandra's LWW, they will "eventually" "converge". > > From: AmirHossein Roozbahany<mailto:band...@outlook.com> > Sent: 11/8/2016 8:16 AM > To: users@kafka.apache.org<mailto:users@kafka.apache.org> > Subject: RE: is there a way to make sure two consumers receive the same > message from the broker? > > Generally Cassandra itself is not consistent enough, even with quorum > read-writes, say one of the writes fail, the nodes who received the data > won't roll back and it might lead to dirty reads which in turn makes roll > back logic tricky (if not impossible). You can use linearizable writes but > if you want to use them all the time, why bother using Cassandra!? > > The important thing about Cassandra is that all of the nodes will > eventually have the same data(after anti-entropy or read-repair). they are > "convergent", they will "eventually" converge, and practically they > converge pretty fast. > > I think what you might actually need is to make two databases > convergent(not necessarily fully consistent at any given time) if the > latest update to a document in es leads always win when Cassandra is doing > es they will "eventually" "converge". > > Doing so is easy, as fast as I know es assigns a _version number to a > document and increases it on every update, now if your use of in Cassandra > insert statement as the "writetime". Now when Cassandra is doing read > repair the record with higher writetime will win. > > Using es's document _version field is just one option, you can use > something from you domain or kafka's offset or machine timestamp (not > recommended at all). > > I hope it could help > > From: kant kodali<mailto:kanth...@gmail.com> > Sent: 11/7/2016 8:18 PM > To: users@kafka.apache.org<mailto:users@kafka.apache.org> > Subject: Re: is there a way to make sure two consumers receive the same > message from the broker? > > Hi AmitHossein, > > I still don't see how that guarantees consistency at any given time. other > words how do I know at time X the data in Cassandra and ES are the same. > > Thanks > > > On Mon, Nov 7, 2016 at 3:26 AM, AmirHossein Roozbahany < > diver...@outlook.com > > wrote: > > > Hi > > > > Can you use elasticsearch _version field as cassandra's > > writetime?(_version is strictly increasing, cassandra uses writetime for > > applying LWW, so last write in elasticsearch will always win) > > > > It needs no transaction and makes databases convergent. > > > > > > > > From: kant kodali <kanth...@gmail.com> > > Sent: Monday, November 7, 2016 3:08 AM > > To: users@kafka.apache.org > > Subject: Re: is there a way to make sure two consumers receive the same > > message from the broker? > > > > Hi Hans, > > > > The two storages we use are Cassandra and Elastic search and they are on > > the same datacenter for now. > > The Programming Language we use is Java and OS would be Ubuntu or CentOS. > > We get messages in JSON format so we insert into Elastic Search directly > > and for Cassandra we transform JSON message into appropriate model so we > > could insert into a Cassandra table. > > The rate we currently get is about 100K/sec which is awesome but I am > > pretty sure this will go down once when we implement 2PC or transactional > > writes. > > > > Thanks, > > kant > > >
Re: is there a way to make sure two consumers receive the same message from the broker?
And there is this https://github.com/vroyer/elassandra which is still under active development and not sure how they plan to keep up with Apache Cassandra moving forward. On Mon, Nov 7, 2016 at 9:36 AM, kant kodali <kanth...@gmail.com> wrote: > Fixing typo's > > Hi Tauzell, > > Yeah our users want to query, do aggregations on Elastic Search directly > and we cannot have inconsistent data because say the writes didn't make it > into Cassandra but made it to Elastic search then a simple aggregations > like count will lead to a wrong answer but again as @Hans pointed out this > is no longer a Kafka question and also your solution has merits in its own > way which I really appreciate it! your solution does make writes faster and > probably some performance penalty on the read side given repairs happen > during the read stage in Cassandra (We could check in both but since our > users query elastic search directly there is no way for us to check it in > Cassandra else we could go with your solution as well). > > Basically, we use ES as an index for Cassandra since secondary indexes in > Cassandra (including the latest implementation SASI) doesn't work with our > use case since we have high cardinality columns (which means every row in a > column is unique so index on a high cardinality column is not very > efficient given the underlying data structure used by SASI, but with > inverted index which is used by ES is much faster). > > We do use Apache Spark along with Cassandra and I am trying to explore > Succint http://succinct.cs.berkeley.edu/wp/wordpress/ and if everything > works out with Succint we can get rid of elastic search. The only thing > that I worry and still testing with Spark, Cassandra and Succint is whether > If the aggregations/computations of a column or search on particular > Cassandra field/column can happen in real time given a big dataset (with > ES it does so the goal is to see if we can get somewhere close or perform > even better). > > Thanks! > > >
Re: is there a way to make sure two consumers receive the same message from the broker?
Fixing typo's Hi Tauzell, Yeah our users want to query, do aggregations on Elastic Search directly and we cannot have inconsistent data because say the writes didn't make it into Cassandra but made it to Elastic search then a simple aggregations like count will lead to a wrong answer but again as @Hans pointed out this is no longer a Kafka question and also your solution has merits in its own way which I really appreciate it! your solution does make writes faster and probably some performance penalty on the read side given repairs happen during the read stage in Cassandra (We could check in both but since our users query elastic search directly there is no way for us to check it in Cassandra else we could go with your solution as well). Basically, we use ES as an index for Cassandra since secondary indexes in Cassandra (including the latest implementation SASI) doesn't work with our use case since we have high cardinality columns (which means every row in a column is unique so index on a high cardinality column is not very efficient given the underlying data structure used by SASI, but with inverted index which is used by ES is much faster). We do use Apache Spark along with Cassandra and I am trying to explore Succint http://succinct.cs.berkeley.edu/wp/wordpress/ and if everything works out with Succint we can get rid of elastic search. The only thing that I worry and still testing with Spark, Cassandra and Succint is whether If the aggregations/computations of a column or search on particular Cassandra field/column can happen in real time given a big dataset (with ES it does so the goal is to see if we can get somewhere close or perform even better). Thanks!
Re: is there a way to make sure two consumers receive the same message from the broker?
Hi Tauzell, Yeah our users want to query, do aggregations on Elastic Search directly and we cannot have inconsistent data because say the writes didn't make it into Cassandra but made it to Elastic search then a simple aggregations like count will lead to a wrong answer but again as @Hans pointed out this is no longer a Kafka question and also your solution has merits in its own way which I really appreciate it! your solution does make writes faster and probably some performance penalty on the read side give repairs happen during the read stage in Cassandra (We could check in both but since our users query elastic search directly there is no way for us to check it in Cassandra we could go with your solution as well). Basically, we use ES as an index for Cassandra since secondary indexes in Cassandra (including the latest implementation SASI) doesn't work with our use case since we have high cardinality columns (which means every row in a column is unique so index on a high cardinality column is not very efficient given the underlying data structure used by SASI, but with inverted index which is used by ES is much faster). We do use Apache Spark along with Cassandra and I am trying to explore Succint http://succinct.cs.berkeley.edu/wp/wordpress/ and if everything works out with Succint we can get rid of elastic search. The only thing that I worry and still testing with Spark, Cassandra and Succint is whether If the aggregations/computations of a column or search on particular Cassandra field/column can happen in real time given a big dataset (with ES it does so the goal is to see if we can get somewhere close or perform even better). Thanks! On Mon, Nov 7, 2016 at 8:57 AM, Tauzell, Dave <dave.tauz...@surescripts.com> wrote: > Here is a scenario where this could be useful: > >Add the kafka offset as a field on the record in both Cassandra and > Elasticsearch > > Now when you get search results from Elastic search and look up details in > Cassandra you can know if they come from the same kafka record. If you > can use the offset as part of the Cassandra Partition key, or as a > clustering key, then you could specifically retrieve a version of the > record from Cassandra that matches Elasticsearch (assuming it made it). > > If your real goal is to guarantee that the two datasets always have the > same set of messages from Kafka ... I don't think this is possible. > > -Dave > > -Original Message- > From: kant kodali [mailto:kanth...@gmail.com] > Sent: Monday, November 7, 2016 10:48 AM > To: users@kafka.apache.org > Subject: Re: is there a way to make sure two consumers receive the same > message from the broker? > > Hi AmitHossein, > > I still don't see how that guarantees consistency at any given time. other > words how do I know at time X the data in Cassandra and ES are the same. > > Thanks > > > On Mon, Nov 7, 2016 at 3:26 AM, AmirHossein Roozbahany < > diver...@outlook.com > > wrote: > > > Hi > > > > Can you use elasticsearch _version field as cassandra's > > writetime?(_version is strictly increasing, cassandra uses writetime > > for applying LWW, so last write in elasticsearch will always win) > > > > It needs no transaction and makes databases convergent. > > > > > > > > From: kant kodali <kanth...@gmail.com> > > Sent: Monday, November 7, 2016 3:08 AM > > To: users@kafka.apache.org > > Subject: Re: is there a way to make sure two consumers receive the > > same message from the broker? > > > > Hi Hans, > > > > The two storages we use are Cassandra and Elastic search and they are > > on the same datacenter for now. > > The Programming Language we use is Java and OS would be Ubuntu or CentOS. > > We get messages in JSON format so we insert into Elastic Search > > directly and for Cassandra we transform JSON message into appropriate > > model so we could insert into a Cassandra table. > > The rate we currently get is about 100K/sec which is awesome but I am > > pretty sure this will go down once when we implement 2PC or > > transactional writes. > > > > Thanks, > > kant > > > This e-mail and any files transmitted with it are confidential, may > contain sensitive information, and are intended solely for the use of the > individual or entity to whom they are addressed. If you have received this > e-mail in error, please notify the sender by reply e-mail immediately and > destroy all copies of the e-mail and any attachments. >
Re: is there a way to make sure two consumers receive the same message from the broker?
Hi AmitHossein, I still don't see how that guarantees consistency at any given time. other words how do I know at time X the data in Cassandra and ES are the same. Thanks On Mon, Nov 7, 2016 at 3:26 AM, AmirHossein Roozbahany <diver...@outlook.com > wrote: > Hi > > Can you use elasticsearch _version field as cassandra's > writetime?(_version is strictly increasing, cassandra uses writetime for > applying LWW, so last write in elasticsearch will always win) > > It needs no transaction and makes databases convergent. > > > ____ > From: kant kodali <kanth...@gmail.com> > Sent: Monday, November 7, 2016 3:08 AM > To: users@kafka.apache.org > Subject: Re: is there a way to make sure two consumers receive the same > message from the broker? > > Hi Hans, > > The two storages we use are Cassandra and Elastic search and they are on > the same datacenter for now. > The Programming Language we use is Java and OS would be Ubuntu or CentOS. > We get messages in JSON format so we insert into Elastic Search directly > and for Cassandra we transform JSON message into appropriate model so we > could insert into a Cassandra table. > The rate we currently get is about 100K/sec which is awesome but I am > pretty sure this will go down once when we implement 2PC or transactional > writes. > > Thanks, > kant >
Re: is there a way to make sure two consumers receive the same message from the broker?
Hi Hans, We currently do #2 and thats is quite slow so yeah In thoery #1 is probably a better choice although its not quite what we want since it doesn't guarantee consistency at any given time as you have already pointed out. Thanks a lot for the response! kant On Mon, Nov 7, 2016 at 6:31 AM, Hans Jespersen <h...@confluent.io> wrote: > I don't believe that either of your two storage systems support distributed > atomic transactions. > You are just going to have to do one of the following: > 1) update them separately (in parallel) and be aware that their committed > offsets may be slightly different at certain points in time > 2) update one and when you are sure the data is in the first storage, then > update the other storage and be aware that you need to handle your own > rollback logic if the second storage system is down or throws an error when > you try to write to it. > > It is very common in Kafka community to do #1 but in either case this is no > longer a Kafka question and has become more of a a distributed database > design question. > > -hans > > /** > * Hans Jespersen, Principal Systems Engineer, Confluent Inc. > * h...@confluent.io (650)924-2670 > */ > > On Sun, Nov 6, 2016 at 7:08 PM, kant kodali <kanth...@gmail.com> wrote: > > > Hi Hans, > > > > The two storages we use are Cassandra and Elastic search and they are on > > the same datacenter for now. > > The Programming Language we use is Java and OS would be Ubuntu or CentOS. > > We get messages in JSON format so we insert into Elastic Search directly > > and for Cassandra we transform JSON message into appropriate model so we > > could insert into a Cassandra table. > > The rate we currently get is about 100K/sec which is awesome but I am > > pretty sure this will go down once when we implement 2PC or transactional > > writes. > > > > Thanks, > > kant > > >
Re: is there a way to make sure two consumers receive the same message from the broker?
Hi Hans, The two storages we use are Cassandra and Elastic search and they are on the same datacenter for now. The Programming Language we use is Java and OS would be Ubuntu or CentOS. We get messages in JSON format so we insert into Elastic Search directly and for Cassandra we transform JSON message into appropriate model so we could insert into a Cassandra table. The rate we currently get is about 100K/sec which is awesome but I am pretty sure this will go down once when we implement 2PC or transactional writes. Thanks, kant
Re: is there a way to make sure two consumers receive the same message from the broker?
Hi! Thanks. any pointers on how to do that? On Sun, Nov 6, 2016 at 2:32 PM, Tauzell, Dave <dave.tauz...@surescripts.com> wrote: > You should have one consumer pull the message and submit the data to each > storage using an XA transaction. > > > On Nov 5, 2016, at 19:49, kant kodali <kanth...@gmail.com> wrote: > > > > yes this problem can definetly be approached in many ways but given the > > hard constraints by our clients we don't seem to have many options. so > the > > problem is we have to keep two storages systems in sync all the time. so > > whatever the data that is storage 1 should also be in storage 2 at any > > given time. so we explored the following options > > > > 1) we thought about ETL from storage 1 to storage 2 but that approach > has > > bunch of drawbacks given the time constraints. > > 2) Add a common service on top of two storages and do some sort of 2PC > but > > that would degrade the write performance. Morever we dont really have a > > control over how fast each write/store can happen at each storage layer > > (because these two storages are completely different). > > > > so I started exploring if there is any tricks I could do with Kafka? > > > > > > > >> On Sat, Nov 5, 2016 at 5:01 PM, Hans Jespersen <h...@confluent.io> > wrote: > >> > >> Yes exactly. If consumer 1 gets message with offset 17 then it can write > >> that offset into an external storage that consumer 2 can also check to > >> ensure that it keeps in sync with consumer 1. > >> > >> Just curious though why you would need to do this? What is the use case > >> because there may be a better way to get you the functionality you want? > >> > >> -hans > >> > >> > >> > >> > >>> On Nov 5, 2016, at 4:31 PM, kant kodali <kanth...@gmail.com> wrote: > >>> > >>> I am new to Kafka and reading this statement "write consumer 1 and > >> consumer > >>> 2 to share a common external offset storage" I can interpret it many > ways > >>> but my best guess is as follows. > >>> > >>> Are you saying write the current offset of each consumer to a common > >>> external storage? > >>> > >>> > >>>> On Sat, Nov 5, 2016 at 4:15 PM, kant kodali <kanth...@gmail.com> > wrote: > >>>> > >>>> Hi Hans, > >>>> > >>>> What do you mean by "write consumer 1 and consumer 2 to share a common > >>>> external offset storage" ? can you please elaborate a bit more. > >>>> > >>>> Thanks! > >>>> > >>>> On Sat, Nov 5, 2016 at 4:00 PM, Hans Jespersen <h...@confluent.io> > >> wrote: > >>>> > >>>>> There is no built in mechanism to do this in Apache Kafka but if you > >> can > >>>>> write consumer 1 and consumer 2 to share a common external offset > >> storage > >>>>> then you may be able to build the functionality you seek. > >>>>> > >>>>> -hans > >>>>> > >>>>> > >>>>> > >>>>>> On Nov 5, 2016, at 3:55 PM, kant kodali <kanth...@gmail.com> wrote: > >>>>>> > >>>>>> Sorry there is a typo. here is a restatement. > >>>>>> > >>>>>> Is there a way to make sure two consumers receive the same message > >> from > >>>>> the > >>>>>> kafka broker in a atomic way? such that if consumer 1 gets a message > >>>>>> consumer 2 should also get that message and if consumer 1 fails for > >>>>>> whatever reason consumer 2 should also rollback to previous offset > (to > >>>>> the > >>>>>> same offset as consumer 1) or invalidate or something like that. is > >> that > >>>>>> possible? > >>>>> > >>>>> > >>>> > >> > >> > This e-mail and any files transmitted with it are confidential, may > contain sensitive information, and are intended solely for the use of the > individual or entity to whom they are addressed. If you have received this > e-mail in error, please notify the sender by reply e-mail immediately and > destroy all copies of the e-mail and any attachments. >
Re: is there a way to make sure two consumers receive the same message from the broker?
yes this problem can definetly be approached in many ways but given the hard constraints by our clients we don't seem to have many options. so the problem is we have to keep two storages systems in sync all the time. so whatever the data that is storage 1 should also be in storage 2 at any given time. so we explored the following options 1) we thought about ETL from storage 1 to storage 2 but that approach has bunch of drawbacks given the time constraints. 2) Add a common service on top of two storages and do some sort of 2PC but that would degrade the write performance. Morever we dont really have a control over how fast each write/store can happen at each storage layer (because these two storages are completely different). so I started exploring if there is any tricks I could do with Kafka? On Sat, Nov 5, 2016 at 5:01 PM, Hans Jespersen <h...@confluent.io> wrote: > Yes exactly. If consumer 1 gets message with offset 17 then it can write > that offset into an external storage that consumer 2 can also check to > ensure that it keeps in sync with consumer 1. > > Just curious though why you would need to do this? What is the use case > because there may be a better way to get you the functionality you want? > > -hans > > > > > > On Nov 5, 2016, at 4:31 PM, kant kodali <kanth...@gmail.com> wrote: > > > > I am new to Kafka and reading this statement "write consumer 1 and > consumer > > 2 to share a common external offset storage" I can interpret it many ways > > but my best guess is as follows. > > > > Are you saying write the current offset of each consumer to a common > > external storage? > > > > > > On Sat, Nov 5, 2016 at 4:15 PM, kant kodali <kanth...@gmail.com> wrote: > > > >> Hi Hans, > >> > >> What do you mean by "write consumer 1 and consumer 2 to share a common > >> external offset storage" ? can you please elaborate a bit more. > >> > >> Thanks! > >> > >> On Sat, Nov 5, 2016 at 4:00 PM, Hans Jespersen <h...@confluent.io> > wrote: > >> > >>> There is no built in mechanism to do this in Apache Kafka but if you > can > >>> write consumer 1 and consumer 2 to share a common external offset > storage > >>> then you may be able to build the functionality you seek. > >>> > >>> -hans > >>> > >>> > >>> > >>>> On Nov 5, 2016, at 3:55 PM, kant kodali <kanth...@gmail.com> wrote: > >>>> > >>>> Sorry there is a typo. here is a restatement. > >>>> > >>>> Is there a way to make sure two consumers receive the same message > from > >>> the > >>>> kafka broker in a atomic way? such that if consumer 1 gets a message > >>>> consumer 2 should also get that message and if consumer 1 fails for > >>>> whatever reason consumer 2 should also rollback to previous offset (to > >>> the > >>>> same offset as consumer 1) or invalidate or something like that. is > that > >>>> possible? > >>> > >>> > >> > >
Re: is there a way to make sure two consumers receive the same message from the broker?
I am new to Kafka and reading this statement "write consumer 1 and consumer 2 to share a common external offset storage" I can interpret it many ways but my best guess is as follows. Are you saying write the current offset of each consumer to a common external storage? On Sat, Nov 5, 2016 at 4:15 PM, kant kodali <kanth...@gmail.com> wrote: > Hi Hans, > > What do you mean by "write consumer 1 and consumer 2 to share a common > external offset storage" ? can you please elaborate a bit more. > > Thanks! > > On Sat, Nov 5, 2016 at 4:00 PM, Hans Jespersen <h...@confluent.io> wrote: > >> There is no built in mechanism to do this in Apache Kafka but if you can >> write consumer 1 and consumer 2 to share a common external offset storage >> then you may be able to build the functionality you seek. >> >> -hans >> >> >> >> > On Nov 5, 2016, at 3:55 PM, kant kodali <kanth...@gmail.com> wrote: >> > >> > Sorry there is a typo. here is a restatement. >> > >> > Is there a way to make sure two consumers receive the same message from >> the >> > kafka broker in a atomic way? such that if consumer 1 gets a message >> > consumer 2 should also get that message and if consumer 1 fails for >> > whatever reason consumer 2 should also rollback to previous offset (to >> the >> > same offset as consumer 1) or invalidate or something like that. is that >> > possible? >> >> >
Re: is there a way to make sure two consumers receive the same message from the broker?
Hi Hans, What do you mean by "write consumer 1 and consumer 2 to share a common external offset storage" ? can you please elaborate a bit more. Thanks! On Sat, Nov 5, 2016 at 4:00 PM, Hans Jespersen <h...@confluent.io> wrote: > There is no built in mechanism to do this in Apache Kafka but if you can > write consumer 1 and consumer 2 to share a common external offset storage > then you may be able to build the functionality you seek. > > -hans > > > > > On Nov 5, 2016, at 3:55 PM, kant kodali <kanth...@gmail.com> wrote: > > > > Sorry there is a typo. here is a restatement. > > > > Is there a way to make sure two consumers receive the same message from > the > > kafka broker in a atomic way? such that if consumer 1 gets a message > > consumer 2 should also get that message and if consumer 1 fails for > > whatever reason consumer 2 should also rollback to previous offset (to > the > > same offset as consumer 1) or invalidate or something like that. is that > > possible? > >
Re: is there a way to make sure two consumers receive the same message from the broker?
Sorry there is a typo. here is a restatement. Is there a way to make sure two consumers receive the same message from the kafka broker in a atomic way? such that if consumer 1 gets a message consumer 2 should also get that message and if consumer 1 fails for whatever reason consumer 2 should also rollback to previous offset (to the same offset as consumer 1) or invalidate or something like that. is that possible?
is there a way to make sure two consumers receive the same message from the broker?
is there a way to make sure two consumers receive the same message from the kafka broker in a atomic way? such that if consumer 1 gets a message consumer 2 should also get that message and in case one of the consumer fails for whatever reason consumer 2 should also rollback to previous offset or invalidate or something like that. is that possible?
Re: producer can't push msg sometimes with 1 broker recoved
@Fei Just curious why you guys are interested in using Kafka. I thought alcatel-lucent usually create their own software no? On Fri, Sep 23, 2016 10:36 PM, Kamal C kamaltar...@gmail.com wrote: Reduce the metadata refresh interval 'metadata.max.age.ms' from 5 min to your desired time interval. This may reduce the time window of non-availability broker. -- Kamal
Re: why did Kafka choose pull instead of push for a consumer ?
@Gerard Here are my initial benchmarks Producer on Machine 1 (m4.xlarge on AWS)Broker on Machine 2 (m4.xlarge on AWS) Consumer on Machine 3 (m4.xlarge on AWS) Data size 1.2KB Receive throughtput: ~24K Kafka Receive throughput ~58K (same exact configuration) All the benchmarks I ran are with default options So what pulsar guys are saying is that Kafka doesn't persist every message by default instead it would batch them for a period of time and then persist so if the JVM crashes before it persist all the messages that are in the batch are lost whereas pulsar guarantees strong durability by storing every message to write ahead log so messages are never lost. My question now is that what settings I need to change in Kafka so it will store every message? that way I am comparing apples to apples. On Fri, Sep 23, 2016 12:06 AM, Gerard Klijs gerard.kl...@dizzit.com wrote: I haven't tried it myself, nor very likely will in the near future, but since it's also distributed I guess that with a large enough cluster you will be able to handle any load. One of the things kafka might be better at is more connecters available, a better at least once guarantee, better monitoring options. I really don't know, but if latancy is really important pulsar might be better, they used kafka before at yahoo and maybe still do for some stuff, recent work on https://github.com/yahoo/kafka-manager seems to suggest so. Alternatively you could configure a kafka topic/producer/consumer to limit latency, and that may also be enough to get a low enough latency. It would certainly be interesting to compare the two, with the same hardware, and with high load. On Thu, Sep 22, 2016 at 6:01 PM kant kodali <kanth...@gmail.com> wrote: > @Gerard Thanks for this. It looks good any benchmarks on this throughput > wise? > > > > > > > On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com > wrote: > We have a simple application producing 1 msg/sec, and did nothing to > > optimise the performance and have about a 10 msec delay between consumer > > and producer. When low latency is important, maybe pulsar is a better fit, > > https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ . > > > > > On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman <mikfree...@gmail.com> > > wrote: > > > > > > Thanks for sharing Radek, great article. > > > > > > Michael > > > > > > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski <ra...@gruchalski.com> > > > wrote: > > > > > > > > Please read this article: > > > > > > > > > https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying > > > > > > > > – > > > > Best regards, > > > > Radek Gruchalski > > > > ra...@gruchalski.com > > > > > > > > > > > > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com) > > > wrote: > > > > > > > > Still it should be possible to implement using reactive streams right. > > > > Could you please enlighten me on what are the some major differences > you > > > > see > > > > between a commit log and a message queue? I see them being different > only > > > > in the > > > > implementation but not functionality wise so I would be glad to hear > your > > > > thoughts. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski > ra...@gruchalski.com > > > > wrote: > > > > Kafka is not a queue. It’s a distributed commit log. > > > > > > > > > > > > > > > > > > > > – > > > > > > > > Best regards, > > > > > > > > Radek Gruchalski > > > > > > > > ra...@gruchalski.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hmm...Looks like Kafka is written in Scala. There is this thing called > > > > > > > > reactive > > > > > > > > streams where a slow consumer can apply
Does Kafka Sync/persist every message from a publisher by default?
Does Kafka Sync/persist every message from a publisher by default? If not, What settings should I change so I Sync every message?
Re: why did Kafka choose pull instead of push for a consumer ?
@Gerard Thanks for this. It looks good any benchmarks on this throughput wise? On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com wrote: We have a simple application producing 1 msg/sec, and did nothing to optimise the performance and have about a 10 msec delay between consumer and producer. When low latency is important, maybe pulsar is a better fit, https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ . On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman <mikfree...@gmail.com> wrote: > Thanks for sharing Radek, great article. > > Michael > > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski <ra...@gruchalski.com> > wrote: > > > > Please read this article: > > > https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying > > > > – > > Best regards, > > Radek Gruchalski > > ra...@gruchalski.com > > > > > > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com) > wrote: > > > > Still it should be possible to implement using reactive streams right. > > Could you please enlighten me on what are the some major differences you > > see > > between a commit log and a message queue? I see them being different only > > in the > > implementation but not functionality wise so I would be glad to hear your > > thoughts. > > > > > > > > > > > > > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com > > wrote: > > Kafka is not a queue. It’s a distributed commit log. > > > > > > > > > > – > > > > Best regards, > > > > Radek Gruchalski > > > > ra...@gruchalski.com > > > > > > > > > > > > > > > > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) > > wrote: > > > > > > > > > > Hmm...Looks like Kafka is written in Scala. There is this thing called > > > > reactive > > > > streams where a slow consumer can apply back pressure if they are > consuming > > > > slow. Even with Java this is possible with a Library called RxJava and > > > > these > > > > ideas will be incorporated in Java 9 as well. > > > > I still don't see why they would pick poll just to solve this one problem > > > > and > > > > compensating on others. Poll just don't sound realtime. I heard from some > > > > people > > > > that they would set poll to 100ms. Well 1) that is a lot of time. 2) > > > > Financial > > > > applications requires micro second latency. Kafka from what I understand > > > > looks > > > > like has a very high latency and here is the article. > > > > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by > > > > articles but I ran my own experiments on different queues and my numbers > > > > are > > > > very close to this article so I would say whoever wrote this article has > > > > done a > > > > good Job. 3) poll does generate unnecessary traffic in case if the data > > > > isn't > > > > available. > > > > Finally still not sure why they would pick poll() ? or do they plan on > > > > introducing reactive streams?Thanks,kant > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com > > > > wrote: > > > > I'm only guessing here regarding if this is the reason: > > > > > > > > > > Pull is much more sensible when a lot of data is pushed through. It > allows > > > > consumers consuming at their own pace, slow consumers do not slow the > > > > complete > > > > system down. > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Best regards, > > > > > > > > > > Rad > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" < > kanth...@gmail.com> > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > why did Kafka choose pull instead of push for a consumer? push sounds > like > > > > it > > > > > > > > > > is more realtime to me than poll and also wouldn't poll just keeps > polling > > > > even > > > > > > > > > > when they are no messages in the broker causing more traffic? please > > > > enlighten > > > > > > > > > > me >
Re: any update on this?
Yes ofcourse the goal shouldn't be moving towards consul. It should just be flexible enough for users to pick any distributed coordinated system. On Mon, Sep 19, 2016 2:23 AM, Jens Rantil jens.ran...@tink.se wrote: I think I read somewhere that the long-term goal is to make Kafka independent of Zookeeper alltogether. Maybe not worth spending time on migrating to Consul in that case. Cheers, Jens On Sat, Sep 17, 2016 at 10:38 PM Jennifer Fountain <jfount...@meetme.com> wrote: > +2 watching. > > On Sat, Sep 17, 2016 at 2:45 AM, kant kodali <kanth...@gmail.com> wrote: > > > https://issues.apache.org/jira/browse/KAFKA-1793 > > It would be great to use Consul instead of Zookeeper for Kafka and I > think > > it > > would benefit Kafka a lot from the exponentially growing consul > community. > > > > > -- > > > Jennifer Fountain > DevOPS > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: Kafka usecase
Why does comcast needs to do better than 1-2 seconds? On Sun, Sep 18, 2016 8:08 PM, Ghosh, Achintya (Contractor) achintya_gh...@comcast.com wrote: Hi there, We have an usecase where we do a lot of business logic to process each message and sometime it takes 1-2 sec, so will be Kafka fit in our usecase? Thanks Achintya
Re: why did Kafka choose pull instead of push for a consumer ?
Still it should be possible to implement using reactive streams right. Could you please enlighten me on what are the some major differences you see between a commit log and a message queue? I see them being different only in the implementation but not functionality wise so I would be glad to hear your thoughts. On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com wrote: Kafka is not a queue. It’s a distributed commit log. – Best regards, Radek Gruchalski ra...@gruchalski.com On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) wrote: Hmm...Looks like Kafka is written in Scala. There is this thing called reactive streams where a slow consumer can apply back pressure if they are consuming slow. Even with Java this is possible with a Library called RxJava and these ideas will be incorporated in Java 9 as well. I still don't see why they would pick poll just to solve this one problem and compensating on others. Poll just don't sound realtime. I heard from some people that they would set poll to 100ms. Well 1) that is a lot of time. 2) Financial applications requires micro second latency. Kafka from what I understand looks like has a very high latency and here is the article. http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by articles but I ran my own experiments on different queues and my numbers are very close to this article so I would say whoever wrote this article has done a good Job. 3) poll does generate unnecessary traffic in case if the data isn't available. Finally still not sure why they would pick poll() ? or do they plan on introducing reactive streams?Thanks,kant On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com wrote: I'm only guessing here regarding if this is the reason: Pull is much more sensible when a lot of data is pushed through. It allows consumers consuming at their own pace, slow consumers do not slow the complete system down. -- Best regards, Rad On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" <kanth...@gmail.com> wrote: why did Kafka choose pull instead of push for a consumer? push sounds like it is more realtime to me than poll and also wouldn't poll just keeps polling even when they are no messages in the broker causing more traffic? please enlighten me
Re: why did Kafka choose pull instead of push for a consumer ?
Hmm...Looks like Kafka is written in Scala. There is this thing called reactive streams where a slow consumer can apply back pressure if they are consuming slow. Even with Java this is possible with a Library called RxJava and these ideas will be incorporated in Java 9 as well. I still don't see why they would pick poll just to solve this one problem and compensating on others. Poll just don't sound realtime. I heard from some people that they would set poll to 100ms. Well 1) that is a lot of time. 2) Financial applications requires micro second latency. Kafka from what I understand looks like has a very high latency and here is the article. http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by articles but I ran my own experiments on different queues and my numbers are very close to this article so I would say whoever wrote this article has done a good Job. 3) poll does generate unnecessary traffic in case if the data isn't available. Finally still not sure why they would pick poll() ? or do they plan on introducing reactive streams?Thanks,kant On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com wrote: I'm only guessing here regarding if this is the reason: Pull is much more sensible when a lot of data is pushed through. It allows consumers consuming at their own pace, slow consumers do not slow the complete system down. -- Best regards, Rad On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" <kanth...@gmail.com> wrote: why did Kafka choose pull instead of push for a consumer? push sounds like it is more realtime to me than poll and also wouldn't poll just keeps polling even when they are no messages in the broker causing more traffic? please enlighten me
why did Kafka choose pull instead of push for a consumer ?
why did Kafka choose pull instead of push for a consumer? push sounds like it is more realtime to me than poll and also wouldn't poll just keeps polling even when they are no messages in the broker causing more traffic? please enlighten me
Re: can one topic be registered in multiple brokers?
so Zookeeper will select which broker it should direct the message to if I have 3 brokers and 3 partitions of a topic? I only finished the benchmarks of Kafka and NSQ and still working NATS Streaming Server (one more day I will finish it). But so far Kafka had a great throughput with Java Client about 55000 messages per second(receive throughput) and 70K messages per second(sendthrougput)each message size is 1KB. so yeah something wrong with node.js client NSQ was ~ 3K receive throughput and~4K send throughput one thing about my NSQ benchmark is first of all I am new to Go and I have the following code (pasting only the lines that are relevant). I just dont know if w.Publish is a blocking call? If it is a blocking call I have to see if the library offers any non-blocking async call since the java version I wrote for Kafka was async. That way I am comparing apples to apples w, _ := nsq.NewProducer("172.31.18.175:4150", config) for i := 0; i<50; i++ { err := w.Publish("nsq_go_test", []byte(data)) if err != nil { log.Panic("Could not connect") } } >From what I heard beating NATS is going to be harder but I rather run my own experiment than going with what I heard. Also it On Sat, Sep 17, 2016 1:38 AM, Ali Akhtar ali.rac...@gmail.com wrote: You can create multiple partitions of a topic and kafka will attempt to distribute them evenly. E.g if you have 3 brokers and you create 3 partitions of a topic, each broker will be the leader of 1 of the 3 partitions. P.S how did the benchmarking go? On Sat, Sep 17, 2016 at 1:36 PM, kant kodali <kanth...@gmail.com> wrote: > can one topic be registered in multiple brokers? if so, which component of > kafka decides which broker should get the message for that particular > topic? > Thanks!
can one topic be registered in multiple brokers?
can one topic be registered in multiple brokers? if so, which component of kafka decides which broker should get the message for that particular topic? Thanks!
any update on this?
https://issues.apache.org/jira/browse/KAFKA-1793 It would be great to use Consul instead of Zookeeper for Kafka and I think it would benefit Kafka a lot from the exponentially growing consul community.
Re: which port should I use 9091 or 9092 or 2181 to send messages through kafka when using a client Library?
@Umesh According to these examples it looks like producer and consumer specifies bootstrap.servers. What is PLAINTEXT? do I need to change something here https://github.com/apache/kafka/blob/trunk/config/server.properties ? because when I specify port 9092 for both producer consumer or just either of them it doesn't seem to work. Only when I specify zookeeper port it seems to work and I don't know why? https://github.com/apache/kafka/blob/trunk/examples/src/main/java/kafka/examples/Producer.java#L34 https://github.com/apache/kafka/blob/trunk/examples/src/main/java/kafka/examples/Consumer.java On Thu, Sep 15, 2016 8:15 AM, UMESH CHAUDHARY umesh9...@gmail.com wrote: No that is not required, when you use new consumer API. You have to specify bootstrap.servers, which will have 9092 (for PLAINTEXT usually ). In old consumer API you need zookeeper server which points on 2181. On Thu, 15 Sep 2016 at 17:03 kant kodali <kanth...@gmail.com> wrote: > I haven't changed anything from > https://github.com/apache/kafka/blob/trunk/config/server.properties > and it looks like it is pointing to zookeeper. > Question: > Does producer client need to point 9092 and Consumer need to point 2181? > is that > the standard? Why not both point to the same thing? > > > > > > > On Thu, Sep 15, 2016 4:24 AM, Ali Akhtar ali.rac...@gmail.com > wrote: > Examine server.properties and see which port you're using in there > > > > > On Thu, Sep 15, 2016 at 3:52 PM, kant kodali <kanth...@gmail.com> wrote: > > > > > > which port should I use 9091 or 9092 or 2181 to send messages through > kafka > > > when using a client Library? > > > I start kafka as follows: > > > sudo bin/zookeeper-server-start.sh config/zookeeper.propertiessudo > > > ./bin/kafka-server-start.sh config/server.properties > > > > > > and I dont see any process running on 9091 or 9092 however lot of client > > > library > > > examples have a consumer client pointing to 9092. for example here > > > https://github.com/apache/kafka/blob/trunk/examples/src/main > > > /java/kafka/examples/Producer.java#L34 > > > shouldn't both producer and consumer point to zookeeper port 2181? which > I > > > am > > > assuming will do the lookup? > > > Thanks,Kant
Re: which port should I use 9091 or 9092 or 2181 to send messages through kafka when using a client Library?
I haven't changed anything from https://github.com/apache/kafka/blob/trunk/config/server.properties and it looks like it is pointing to zookeeper. Question: Does producer client need to point 9092 and Consumer need to point 2181? is that the standard? Why not both point to the same thing? On Thu, Sep 15, 2016 4:24 AM, Ali Akhtar ali.rac...@gmail.com wrote: Examine server.properties and see which port you're using in there On Thu, Sep 15, 2016 at 3:52 PM, kant kodali <kanth...@gmail.com> wrote: which port should I use 9091 or 9092 or 2181 to send messages through kafka when using a client Library? I start kafka as follows: sudo bin/zookeeper-server-start.sh config/zookeeper.propertiessudo ./bin/kafka-server-start.sh config/server.properties and I dont see any process running on 9091 or 9092 however lot of client library examples have a consumer client pointing to 9092. for example here https://github.com/apache/kafka/blob/trunk/examples/src/main /java/kafka/examples/Producer.java#L34 shouldn't both producer and consumer point to zookeeper port 2181? which I am assuming will do the lookup? Thanks,Kant
which port should I use 9091 or 9092 or 2181 to send messages through kafka when using a client Library?
which port should I use 9091 or 9092 or 2181 to send messages through kafka when using a client Library? I start kafka as follows: sudo bin/zookeeper-server-start.sh config/zookeeper.propertiessudo ./bin/kafka-server-start.sh config/server.properties and I dont see any process running on 9091 or 9092 however lot of client library examples have a consumer client pointing to 9092. for example here https://github.com/apache/kafka/blob/trunk/examples/src/main/java/kafka/examples/Producer.java#L34 shouldn't both producer and consumer point to zookeeper port 2181? which I am assuming will do the lookup? Thanks,Kant
Re: What is the fair setup of Kafka to be comparable with NATS or NSQ?
Hi Ben, Also the article that you pointed out clearly shows the setup had multiple partitions and multiple workers and so on..that's whyat this point my biggest question is what is the fair setup for Kafka so its comparable with NATS and NSQ? and since you suspect the client Library I can give that a go..but can you please confirm that one partition on one broker should be able to handle 300K messages of 1KB data size for each message? Thanks,kant On Thu, Sep 15, 2016 2:28 AM, kant kodali kanth...@gmail.com wrote: Hi Ben, I can give that a try but can you tell me the suspicion or motivation behind it? other words you think single partition and single broker should be comparable to the setup I had with NATS and NSQ except you suspect the client library or something? Thanks,Kant On Thu, Sep 15, 2016 2:16 AM, Ben Davison ben.davi...@7digital.com wrote: Hi Kant, I was following the other thread, can you try using a different benchmarking client for a test. https://grey-boundary.io/load-testing-apache-kafka-on-aws/ Ben On Thursday, 15 September 2016, kant kodali <kanth...@gmail.com> wrote: with Kafka I tried it with 10 messages with single broker and only one partiton that looked instantaneous and ~5K messages/sec for the data size of 1KB I tried it with 1000 messages that looked instantaneous as well ~5K messages/sec for the data size of 1KBI tried it with 10K messages with single broker and only one partiton things started to go down ~1K messages/sec for the data size of 1KB having only one partition on a single broker is a bad? My goal is to run some basic benchmarks on NATS & NSQ & KAFKA I have the same environment for all three (NATS & NSQ & KAFKA) a broker on Machine 1producer on Machine 2Consumer on Machine 3 with a data size of 1KB (so each message is 1KB ) and m4.xlarge aws instance. I have pushed 300K messages with NATS and it was able to handle easily and receive throughput was 5K messages/secI have pushed 300K messages and NSQ and receive throughput was 2K messages/secI am unable to push 300K messages with Kafka with the above configuration and environmentso at this point my biggest question is what is the fair setup for Kafka so its comparable with NATS and NSQ? kant -- This email, including attachments, is private and confidential. If you have received this email in error please notify the sender and delete it from your system. Emails are not secure and may contain viruses. No liability can be accepted for viruses that might be transferred by this email or any attachment. Any unauthorised copying of this message or unauthorised distribution and publication of the information contained herein are prohibited. 7digital Limited. Registered office: 69 Wilson Street, London EC2A 2BB. Registered in England and Wales. Registered No. 04843573.
Re: What is the fair setup of Kafka to be comparable with NATS or NSQ?
Hi Ben, I can give that a try but can you tell me the suspicion or motivation behind it? other words you think single partition and single broker should be comparable to the setup I had with NATS and NSQ except you suspect the client library or something? Thanks,Kant On Thu, Sep 15, 2016 2:16 AM, Ben Davison ben.davi...@7digital.com wrote: Hi Kant, I was following the other thread, can you try using a different benchmarking client for a test. https://grey-boundary.io/load-testing-apache-kafka-on-aws/ Ben On Thursday, 15 September 2016, kant kodali <kanth...@gmail.com> wrote: with Kafka I tried it with 10 messages with single broker and only one partiton that looked instantaneous and ~5K messages/sec for the data size of 1KB I tried it with 1000 messages that looked instantaneous as well ~5K messages/sec for the data size of 1KBI tried it with 10K messages with single broker and only one partiton things started to go down ~1K messages/sec for the data size of 1KB having only one partition on a single broker is a bad? My goal is to run some basic benchmarks on NATS & NSQ & KAFKA I have the same environment for all three (NATS & NSQ & KAFKA) a broker on Machine 1producer on Machine 2Consumer on Machine 3 with a data size of 1KB (so each message is 1KB ) and m4.xlarge aws instance. I have pushed 300K messages with NATS and it was able to handle easily and receive throughput was 5K messages/secI have pushed 300K messages and NSQ and receive throughput was 2K messages/secI am unable to push 300K messages with Kafka with the above configuration and environmentso at this point my biggest question is what is the fair setup for Kafka so its comparable with NATS and NSQ? kant -- This email, including attachments, is private and confidential. If you have received this email in error please notify the sender and delete it from your system. Emails are not secure and may contain viruses. No liability can be accepted for viruses that might be transferred by this email or any attachment. Any unauthorised copying of this message or unauthorised distribution and publication of the information contained herein are prohibited. 7digital Limited. Registered office: 69 Wilson Street, London EC2A 2BB. Registered in England and Wales. Registered No. 04843573.
What is the fair setup of Kafka to be comparable with NATS or NSQ?
with Kafka I tried it with 10 messages with single broker and only one partiton that looked instantaneous and ~5K messages/sec for the data size of 1KB I tried it with 1000 messages that looked instantaneous as well ~5K messages/sec for the data size of 1KBI tried it with 10K messages with single broker and only one partiton things started to go down ~1K messages/sec for the data size of 1KB having only one partition on a single broker is a bad? My goal is to run some basic benchmarks on NATS & NSQ & KAFKA I have the same environment for all three (NATS & NSQ & KAFKA) a broker on Machine 1producer on Machine 2Consumer on Machine 3 with a data size of 1KB (so each message is 1KB ) and m4.xlarge aws instance. I have pushed 300K messages with NATS and it was able to handle easily and receive throughput was 5K messages/secI have pushed 300K messages and NSQ and receive throughput was 2K messages/secI am unable to push 300K messages with Kafka with the above configuration and environmentso at this point my biggest question is what is the fair setup for Kafka so its comparable with NATS and NSQ? kant
Re: hi
I used node.js client libraries for all three and yes I want to make sure I am comparing apples to apples so I make it as equivalent as possible. Again the big question is What is the right setup for Kafka to be comparable with the other I mentioned in my previous email? On Thu, Sep 15, 2016 1:47 AM, Ali Akhtar ali.rac...@gmail.com wrote: The issue is clearly that you're running out of resources, so I would add more brokers and/or larger instances. You're also using Node which is not the best for performance. A compiled language such as Java would give you the best performance. Here's a case study that should help: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Good luck, let us know how it goes On Thu, Sep 15, 2016 at 1:42 PM, kant kodali <kanth...@gmail.com> wrote: yeah.. I tried it with 10 messages with single broker and only one partiton that looked instantaneous and ~5K messages/sec for the data size of 1KBI tried it with 1000 messages that looked instantaneous as well ~5K messages/sec for the data size of 1KBI tried it with 10K messages with single broker and only one partiton things started to go down ~1K messages/sec for the data size of 1KB having only one partition on a single broker is a bad? My goal is to run some basic benchmarks on NATS & NSQ & KAFKA I have the same environment for all three (NATS & NSQ & KAFKA) a broker on Machine 1producer on Machine 2Consumer on Machine 3 with a data size of 1KB (so each message is 1KB ) and m4.xlarge aws instance. I have pushed 300K messages with NATS and it was able to handle easily and receive throughput was 5K messages/secI have pushed 300K messages and NSQ and receive throughput was 2K messages/secI am unable to push 300K messages with Kafka with the above configuration and environment so at this point my biggest question is what is the fair setup for Kafka so its comparable with NATS and NSQ? kant On Thu, Sep 15, 2016 12:43 AM, Ali Akhtar ali.rac...@gmail.com wrote: Lower the workload gradually, start from 10 messages, increase to 100, then 1000, and so on. See if it slows down as the workload increases. If so, you need more brokers + partitions to handle the workload. On Thu, Sep 15, 2016 at 12:42 PM, kant kodali <kanth...@gmail.com> wrote: > m4.xlarge > > > > > > > On Thu, Sep 15, 2016 12:33 AM, Ali Akhtar ali.rac...@gmail.com > > wrote: > What's the instance size that you're using? With 300k messages your single > > broker might not be able to handle it. > > > > > On Thu, Sep 15, 2016 at 12:30 PM, kant kodali <kanth...@gmail.com> wrote: > > > > > My goal is to test the throughput (#messages per second) given my setup and >> > > with a data size of 1KB. if you guys already have some idea on these >> > > numbers >> > > that would be helpful as well. >> > > >> > >> > >> > >> > >> > >> > On Thu, Sep 15, 2016 12:24 AM, kant kodali kanth...@gmail.com >> > > wrote: >> > > 172.* is all private ip's for my machine I double checked it.I have not >> > > changed >> > > any default settingsI dont know how to use kafka-consumer.sh >> > > or kafka-producer.sh because it looks like they want me to specify a group >> > > and I >> > > didn't create any consumer group because I am using single producer and >> > > consumer. is there a default group?Also, I am receiving message but very >> > > late. I >> > > send about 300K messages using the node.js client and I am receiving at a >> > > very >> > > low rate. really not sure what is going on? >> > > >> > >> > >> > >> > >> > >> > On Thu, Sep 15, 2016 12:06 AM, Ali Akhtar ali.rac...@gmail.com >> > > wrote: >> > &g
Re: hi
yeah.. I tried it with 10 messages with single broker and only one partiton that looked instantaneous and ~5K messages/sec for the data size of 1KBI tried it with 1000 messages that looked instantaneous as well ~5K messages/sec for the data size of 1KBI tried it with 10K messages with single broker and only one partiton things started to go down ~1K messages/sec for the data size of 1KB having only one partition on a single broker is a bad? My goal is to run some basic benchmarks on NATS & NSQ & KAFKA I have the same environment for all three (NATS & NSQ & KAFKA) a broker on Machine 1producer on Machine 2Consumer on Machine 3 with a data size of 1KB (so each message is 1KB ) and m4.xlarge aws instance. I have pushed 300K messages with NATS and it was able to handle easily and receive throughput was 5K messages/secI have pushed 300K messages and NSQ and receive throughput was 2K messages/secI am unable to push 300K messages with Kafka with the above configuration and environment so at this point my biggest question is what is the fair setup for Kafka so its comparable with NATS and NSQ? kant On Thu, Sep 15, 2016 12:43 AM, Ali Akhtar ali.rac...@gmail.com wrote: Lower the workload gradually, start from 10 messages, increase to 100, then 1000, and so on. See if it slows down as the workload increases. If so, you need more brokers + partitions to handle the workload. On Thu, Sep 15, 2016 at 12:42 PM, kant kodali <kanth...@gmail.com> wrote: > m4.xlarge > > > > > > > On Thu, Sep 15, 2016 12:33 AM, Ali Akhtar ali.rac...@gmail.com > > wrote: > What's the instance size that you're using? With 300k messages your single > > broker might not be able to handle it. > > > > > On Thu, Sep 15, 2016 at 12:30 PM, kant kodali <kanth...@gmail.com> wrote: > > > > > My goal is to test the throughput (#messages per second) given my setup and >> > > with a data size of 1KB. if you guys already have some idea on these >> > > numbers >> > > that would be helpful as well. >> > > >> > >> > >> > >> > >> > >> > On Thu, Sep 15, 2016 12:24 AM, kant kodali kanth...@gmail.com >> > > wrote: >> > > 172.* is all private ip's for my machine I double checked it.I have not >> > > changed >> > > any default settingsI dont know how to use kafka-consumer.sh >> > > or kafka-producer.sh because it looks like they want me to specify a group >> > > and I >> > > didn't create any consumer group because I am using single producer and >> > > consumer. is there a default group?Also, I am receiving message but very >> > > late. I >> > > send about 300K messages using the node.js client and I am receiving at a >> > > very >> > > low rate. really not sure what is going on? >> > > >> > >> > >> > >> > >> > >> > On Thu, Sep 15, 2016 12:06 AM, Ali Akhtar ali.rac...@gmail.com >> > > wrote: >> > > Your code seems to be using the public ip of the servers. If all 3 machines >> > > >> > are in the same availability zone on AWS, try using the private ip, and >> > > >> > then they might communicate over the local network. >> > > >> > >> > >> > >> > Did you change any default settings? >> > > >> > >> > >> > >> > Do you get the same results if you run kafka-consumer.sh and >> > > >> > kafka-producer.sh instead of the Node code? >> > > >> > >> > >> > >> > On Thu, Sep 15, 2016 at 12:01 PM, kant kodali <kanth...@gmail.com> wrote: >> > > >> > >> > >> > >> > > They are hosted on AWS and I dont think there are any network issues >> > > >> > > because I >> > > >> > > tried testing other Queuing systems with no issues however I am using a >> > > >> > > node.js >> > > >> > > client with the following code. I am not sure if there are any errors or >> > > >> > > anything I didn't set in the following code? >> > > >> > > >> > > >> > > >> > > >> > > //producer var kafka = require('kafka-node'); var >> > > >> > > Producer = kafka.Producer; var Client = kafka.Client;
Re: hi
m4.xlarge On Thu, Sep 15, 2016 12:33 AM, Ali Akhtar ali.rac...@gmail.com wrote: What's the instance size that you're using? With 300k messages your single broker might not be able to handle it. On Thu, Sep 15, 2016 at 12:30 PM, kant kodali <kanth...@gmail.com> wrote: My goal is to test the throughput (#messages per second) given my setup and with a data size of 1KB. if you guys already have some idea on these numbers that would be helpful as well. On Thu, Sep 15, 2016 12:24 AM, kant kodali kanth...@gmail.com wrote: 172.* is all private ip's for my machine I double checked it.I have not changed any default settingsI dont know how to use kafka-consumer.sh or kafka-producer.sh because it looks like they want me to specify a group and I didn't create any consumer group because I am using single producer and consumer. is there a default group?Also, I am receiving message but very late. I send about 300K messages using the node.js client and I am receiving at a very low rate. really not sure what is going on? On Thu, Sep 15, 2016 12:06 AM, Ali Akhtar ali.rac...@gmail.com wrote: Your code seems to be using the public ip of the servers. If all 3 machines are in the same availability zone on AWS, try using the private ip, and then they might communicate over the local network. Did you change any default settings? Do you get the same results if you run kafka-consumer.sh and kafka-producer.sh instead of the Node code? On Thu, Sep 15, 2016 at 12:01 PM, kant kodali <kanth...@gmail.com> wrote: > They are hosted on AWS and I dont think there are any network issues > because I > tried testing other Queuing systems with no issues however I am using a > node.js > client with the following code. I am not sure if there are any errors or > anything I didn't set in the following code? > > > //producer var kafka = require('kafka-node'); var > Producer = kafka.Producer; var Client = kafka.Client; var client = > new Client('172.31.21.175:2181'); var argv = > require('optimist').argv; var topic = argv.topic || 'kafka_test'; var > p = argv.p || 0; var a = argv.a || 0; var producer = new > Producer(client, { requireAcks: 1}); var num = 35; > producer.on('ready', function () { var message = 'Hello World'; > for (var i=0; i<num; i++) { producer.send([ { topic: > topic, partition: p, messages: message, attributes: a } ], function > (err, result) { console.log(err || result); > //process.exit(); }); } }); producer.on('error', > function (err) { console.log('error', err); process.exit(); > }); //Consumer var kafka = require('kafka-node'); var Consumer = > kafka.Consumer; var Offset = kafka.Offset; var Client = > kafka.Client; var argv = require('optimist').argv; var topic = > argv.topic || 'kafka_test'; var client = new > Client('172.31.21.175:2181'); var topics = [ {topic: topic, > partition: 0} ]; var options = { autoCommit: false, fetchMaxWaitMs: > 1000 }; var consumer = new Consumer(client, topics, options); var > offset = new Offset(client); var start; var received = 0; var > target = 20; var hash = 1000; consumer.on('message', function > (message) { console.log(message); received += 1; if > (received === 1) { start = new Date(); } if (received === target) { > var stop = new Date(); console.log('\nDone test'); > var mps = parseInt(target/((stop-start)/1000)); > console.log('Received at ' + mps + ' msgs/sec'); process.exit(); > } else if (received % hash === 0){ > process.stdout.write(received + '\n'); } }); > consumer.on('error', function (err) { console.log('error', err); }); > > Not using Mixmax yet? > > > > > > > > On Wed, Sep 14, 2016 11:58 PM, Ali Akhtar ali.rac...@gmail.com > wrote: > It sounds like a network issue. Where are the 3 servers located / hosted? > > > > > On Thu, Sep 15, 2016 at 11:51 AM, kant kodali <kanth...@gmail.com> wrote: > > > > > Hi, >> > > I have the following setup. >> > > Single Kafka broker and Zookeeper on Machine 1single Kafka producer on >> > > Machine 2 >> > > Single Kafka Consumer on Machine 3 >> &
Re: hi
My goal is to test the throughput (#messages per second) given my setup and with a data size of 1KB. if you guys already have some idea on these numbers that would be helpful as well. On Thu, Sep 15, 2016 12:24 AM, kant kodali kanth...@gmail.com wrote: 172.* is all private ip's for my machine I double checked it.I have not changed any default settingsI dont know how to use kafka-consumer.sh or kafka-producer.sh because it looks like they want me to specify a group and I didn't create any consumer group because I am using single producer and consumer. is there a default group?Also, I am receiving message but very late. I send about 300K messages using the node.js client and I am receiving at a very low rate. really not sure what is going on? On Thu, Sep 15, 2016 12:06 AM, Ali Akhtar ali.rac...@gmail.com wrote: Your code seems to be using the public ip of the servers. If all 3 machines are in the same availability zone on AWS, try using the private ip, and then they might communicate over the local network. Did you change any default settings? Do you get the same results if you run kafka-consumer.sh and kafka-producer.sh instead of the Node code? On Thu, Sep 15, 2016 at 12:01 PM, kant kodali <kanth...@gmail.com> wrote: > They are hosted on AWS and I dont think there are any network issues > because I > tried testing other Queuing systems with no issues however I am using a > node.js > client with the following code. I am not sure if there are any errors or > anything I didn't set in the following code? > > > //producer var kafka = require('kafka-node'); var > Producer = kafka.Producer; var Client = kafka.Client; var client = > new Client('172.31.21.175:2181'); var argv = > require('optimist').argv; var topic = argv.topic || 'kafka_test'; var > p = argv.p || 0; var a = argv.a || 0; var producer = new > Producer(client, { requireAcks: 1}); var num = 35; > producer.on('ready', function () { var message = 'Hello World'; > for (var i=0; i<num; i++) { producer.send([ { topic: > topic, partition: p, messages: message, attributes: a } ], function > (err, result) { console.log(err || result); > //process.exit(); }); } }); producer.on('error', > function (err) { console.log('error', err); process.exit(); > }); //Consumer var kafka = require('kafka-node'); var Consumer = > kafka.Consumer; var Offset = kafka.Offset; var Client = > kafka.Client; var argv = require('optimist').argv; var topic = > argv.topic || 'kafka_test'; var client = new > Client('172.31.21.175:2181'); var topics = [ {topic: topic, > partition: 0} ]; var options = { autoCommit: false, fetchMaxWaitMs: > 1000 }; var consumer = new Consumer(client, topics, options); var > offset = new Offset(client); var start; var received = 0; var > target = 20; var hash = 1000; consumer.on('message', function > (message) { console.log(message); received += 1; if > (received === 1) { start = new Date(); } if (received === target) { > var stop = new Date(); console.log('\nDone test'); > var mps = parseInt(target/((stop-start)/1000)); > console.log('Received at ' + mps + ' msgs/sec'); process.exit(); > } else if (received % hash === 0){ > process.stdout.write(received + '\n'); } }); > consumer.on('error', function (err) { console.log('error', err); }); > > Not using Mixmax yet? > > > > > > > > On Wed, Sep 14, 2016 11:58 PM, Ali Akhtar ali.rac...@gmail.com > wrote: > It sounds like a network issue. Where are the 3 servers located / hosted? > > > > > On Thu, Sep 15, 2016 at 11:51 AM, kant kodali <kanth...@gmail.com> wrote: > > > > > Hi, >> > > I have the following setup. >> > > Single Kafka broker and Zookeeper on Machine 1single Kafka producer on >> > > Machine 2 >> > > Single Kafka Consumer on Machine 3 >> > > When a producer client sends a message to the Kafka broker by pointing at >> > > the >> > > Zookeeper Server the consumer doesn't seem to get the message right away >> > > instead >> > > it gets after a minute or something (pretty late). I am not sure what >> > > settings I >> > > need to change. any ideas? >> > > Thanks,kant > >
Re: hi
172.* is all private ip's for my machine I double checked it.I have not changed any default settingsI dont know how to use kafka-consumer.sh or kafka-producer.sh because it looks like they want me to specify a group and I didn't create any consumer group because I am using single producer and consumer. is there a default group?Also, I am receiving message but very late. I send about 300K messages using the node.js client and I am receiving at a very low rate. really not sure what is going on? On Thu, Sep 15, 2016 12:06 AM, Ali Akhtar ali.rac...@gmail.com wrote: Your code seems to be using the public ip of the servers. If all 3 machines are in the same availability zone on AWS, try using the private ip, and then they might communicate over the local network. Did you change any default settings? Do you get the same results if you run kafka-consumer.sh and kafka-producer.sh instead of the Node code? On Thu, Sep 15, 2016 at 12:01 PM, kant kodali <kanth...@gmail.com> wrote: > They are hosted on AWS and I dont think there are any network issues > because I > tried testing other Queuing systems with no issues however I am using a > node.js > client with the following code. I am not sure if there are any errors or > anything I didn't set in the following code? > > > //producer var kafka = require('kafka-node'); var > Producer = kafka.Producer; var Client = kafka.Client; var client = > new Client('172.31.21.175:2181'); var argv = > require('optimist').argv; var topic = argv.topic || 'kafka_test'; var > p = argv.p || 0; var a = argv.a || 0; var producer = new > Producer(client, { requireAcks: 1}); var num = 35; > producer.on('ready', function () { var message = 'Hello World'; > for (var i=0; i<num; i++) { producer.send([ { topic: > topic, partition: p, messages: message, attributes: a } ], function > (err, result) { console.log(err || result); > //process.exit(); }); } }); producer.on('error', > function (err) { console.log('error', err); process.exit(); > }); //Consumer var kafka = require('kafka-node'); var Consumer = > kafka.Consumer; var Offset = kafka.Offset; var Client = > kafka.Client; var argv = require('optimist').argv; var topic = > argv.topic || 'kafka_test'; var client = new > Client('172.31.21.175:2181'); var topics = [ {topic: topic, > partition: 0} ]; var options = { autoCommit: false, fetchMaxWaitMs: > 1000 }; var consumer = new Consumer(client, topics, options); var > offset = new Offset(client); var start; var received = 0; var > target = 20; var hash = 1000; consumer.on('message', function > (message) { console.log(message); received += 1; if > (received === 1) { start = new Date(); } if (received === target) { > var stop = new Date(); console.log('\nDone test'); > var mps = parseInt(target/((stop-start)/1000)); > console.log('Received at ' + mps + ' msgs/sec'); process.exit(); > } else if (received % hash === 0){ > process.stdout.write(received + '\n'); } }); > consumer.on('error', function (err) { console.log('error', err); }); > > Not using Mixmax yet? > > > > > > > > On Wed, Sep 14, 2016 11:58 PM, Ali Akhtar ali.rac...@gmail.com > wrote: > It sounds like a network issue. Where are the 3 servers located / hosted? > > > > > On Thu, Sep 15, 2016 at 11:51 AM, kant kodali <kanth...@gmail.com> wrote: > > > > > Hi, >> > > I have the following setup. >> > > Single Kafka broker and Zookeeper on Machine 1single Kafka producer on >> > > Machine 2 >> > > Single Kafka Consumer on Machine 3 >> > > When a producer client sends a message to the Kafka broker by pointing at >> > > the >> > > Zookeeper Server the consumer doesn't seem to get the message right away >> > > instead >> > > it gets after a minute or something (pretty late). I am not sure what >> > > settings I >> > > need to change. any ideas? >> > > Thanks,kant > >
Re: hi
They are hosted on AWS and I dont think there are any network issues because I tried testing other Queuing systems with no issues however I am using a node.js client with the following code. I am not sure if there are any errors or anything I didn't set in the following code? //producervar kafka = require('kafka-node');var Producer = kafka.Producer;var Client = kafka.Client;var client = new Client('172.31.21.175:2181');var argv = require('optimist').argv;var topic = argv.topic || 'kafka_test';var p = argv.p || 0;var a = argv.a || 0; var producer = new Producer(client, { requireAcks: 1});var num = 35; producer.on('ready', function () { var message = 'Hello World'; for (var i=0; i<num; i++) {producer.send([ { topic: topic, partition: p, messages: message, attributes: a } ], function (err, result) { console.log(err || result); //process.exit(); }); }});producer.on('error', function (err) { console.log('error', err); process.exit();});//Consumervar kafka = require('kafka-node');var Consumer = kafka.Consumer;var Offset = kafka.Offset;var Client = kafka.Client;var argv = require('optimist').argv;var topic = argv.topic || 'kafka_test';var client = new Client('172.31.21.175:2181');var topics = [{topic: topic, partition: 0}];var options = { autoCommit: false, fetchMaxWaitMs: 1000 }; var consumer = new Consumer(client, topics, options);var offset = new Offset(client);var start;var received = 0;var target = 20; var hash = 1000;consumer.on('message', function (message) { console.log(message);received += 1;if (received === 1) { start = new Date(); }if (received === target) { var stop = new Date(); console.log('\nDone test'); var mps = parseInt(target/((stop-start)/1000)); console.log('Received at ' + mps + ' msgs/sec'); process.exit();} else if (received % hash === 0){ process.stdout.write(received + '\n');}}); consumer.on('error', function (err) { console.log('error', err);}); Not using Mixmax yet? On Wed, Sep 14, 2016 11:58 PM, Ali Akhtar ali.rac...@gmail.com wrote: It sounds like a network issue. Where are the 3 servers located / hosted? On Thu, Sep 15, 2016 at 11:51 AM, kant kodali <kanth...@gmail.com> wrote: Hi, I have the following setup. Single Kafka broker and Zookeeper on Machine 1single Kafka producer on Machine 2 Single Kafka Consumer on Machine 3 When a producer client sends a message to the Kafka broker by pointing at the Zookeeper Server the consumer doesn't seem to get the message right away instead it gets after a minute or something (pretty late). I am not sure what settings I need to change. any ideas? Thanks,kant
hi
Hi, I have the following setup. Single Kafka broker and Zookeeper on Machine 1single Kafka producer on Machine 2 Single Kafka Consumer on Machine 3 When a producer client sends a message to the Kafka broker by pointing at the Zookeeper Server the consumer doesn't seem to get the message right away instead it gets after a minute or something (pretty late). I am not sure what settings I need to change. any ideas? Thanks,kant
Consumer stops after reaching an offset of 1644
Hi All, I am trying to do a simple benchmark test for Kafka using single broker, producer and consumer however my consumer doesn't seem to receive all the messages produced by the producer so not sure what is going on any help? Here is the full description of the problem. http://stackoverflow.com/questions/39500780/i-am-trying-to-benchmark-kafka-using-node-js-but-it-stops-working-after-certain Thanks!kant