Should set type of log.segment.bytes from INT to LONG

2017-05-16 Thread Yang Cui
We wish to enlarge the segment file size from 1GB to 2GB, but we found that it will throw the exception as: “Invalid value 2147483648 for configuration log.segment.bytes: Not a number of type INT”. It is because of the overflow of INT, if we set l”og.segment.bytes” to 2147483648. Now we are

Re: Brokers is down by “java.io.IOException: Too many open files”

2017-05-16 Thread Yang Cui
Hi Caleb, We already set the number of max open files to 100,000 before this error happened. Normally, the file description is about 20,000, but in some time, it suddenly jump to so many count. This is our monitor about Kafka FD info: 2017-05-17-05:04:19 FD_total_num:19261

Creating read-only state stores from a compacted topic

2017-05-16 Thread Frank Lyaruu
Hi Kafka people, I'm using the low level API that often creates a simple state store based on a (single partition, compacted) topic (It feeds all messages and simply stores all those messages as-is). Now all these stores get their own 'changelog' topic, to back the state store. This is fine, but

Re: weird SerializationException when consumer is fetching and parsing record in streams application

2017-05-16 Thread Sachin Mittal
Folks is there any updated on https://issues.apache.org/jira/browse/KAFKA-4740. This is now so far only pressing issue for our streams application. It happens from time to time and so far our only solution is to increase the offset of the group for that partition beyond the offset that caused

Re: Reg: [VOTE] KIP 157 - Add consumer config options to streams reset tool

2017-05-16 Thread Ewen Cheslack-Postava
+1 (binding) I mentioned this in the PR that triggered this: > KIP is accurate, though this is one of those things that we should probably get a KIP for a standard set of config options across all tools so additions like this can just fall under the umbrella of that KIP... I think it would be

Re: Data loss after a Kafka broker restart scenario.

2017-05-16 Thread Fathima Amara
Hi Mathieu, Thanks for replying. I've already tried by setting "retries" to higher values(maximum up to 3). Since this introduces duplicates which I do not require, I brought the "retries" value back to 0. I would like to know whether there is a way to achieve "exactly-once" guarantee having

Re: Debugging Kafka Streams Windowing

2017-05-16 Thread Mahendra Kariya
I am confused. If what you have mentioned is the case, then - Why would restarting the stream processes resolve the issue? - Why do we get these infinite stream of exceptions only on some boxes in the cluster and not all? - We have tens of other consumers running just fine. We see

Re: What purpose serves the repartition topic?

2017-05-16 Thread João Peixoto
Your explanation makes sense. If I understood correctly it means that one stream thread can actually generate records that will be aggregated in a different thread, based on the new partitioning. I didn't think of that case, which now makes more sense. In my particular case the keys just get

Re: What purpose serves the repartition topic?

2017-05-16 Thread Matthias J. Sax
João, in your example, k.toUpperCase() does break partitioning. Assume you have two records and -- both do have different keys and might be contained in different partitions. After you do a selectKey(), both do have the same key. In order to compute the aggregation correctly, it is

What purpose serves the repartition topic?

2017-05-16 Thread João Peixoto
Certain operations require a repartition topic, such as "selectKey" or "map". What purpose serves this repartition topic? Sample record: {"key": "a", ...} Stream: source.selectKey((k, v) -> KeyValue.pair(k.toUpperCase(), v)).groupByKey() //... >From my understanding, the repartition topic will

Re: Can state stores function as a caching layer for persistent storage

2017-05-16 Thread João Peixoto
Thanks Matthias. My doubt is on a more fundamental level, but I'll ask on a separate thread as per Eno's recommendation. On Tue, May 16, 2017 at 3:26 PM Matthias J. Sax wrote: > Just to add to this: > > Streams create re-partitioning topics "lazy", meaning when the key is

Re: Debugging Kafka Streams Windowing

2017-05-16 Thread Guozhang Wang
Sorry I mis-read your email and confused it with another thread. As for your observed issue, it seems "broker-05:6667" is in an unstable state which is the group coordinator for this stream process app with app id (i.e. group id) "grp_id". Since the streams app cannot commit offsets anymore due

Re: Kafka-streams process stopped processing messages

2017-05-16 Thread Guozhang Wang
Hi Shimi, Just to clarify, your scenario is that 1) you shutdown the single-instance app, cleanly, 2) you restart the app with a single-instance, and that is taking hours, right? Did you use any in-memory stores (i.e. not RocksDB) in your topology? Guozhang On Tue, May 16, 2017 at 1:08 AM,

Re: Can state stores function as a caching layer for persistent storage

2017-05-16 Thread Matthias J. Sax
Just to add to this: Streams create re-partitioning topics "lazy", meaning when the key is (potentially) changes, we only set a flag but don't add the re-partitioning topic. On groupByKey() or join() we check is this "re-partitioning required flag" is set in add the topic. Thus, if you have a

Log compaction failed because offset map doesn't have enough space

2017-05-16 Thread Jun Ma
Hi team, We are having a issue with compacting __consumer_offsets topic in our cluster. We’re seeing logs in log-cleaner.log saying: [2017-05-16 11:56:28,993] INFO Cleaner 0: Building offset map for log __consumer_offsets-15 for 349 segments in offset range [0, 619265471). (kafka.log.LogCleaner)

Re: Kafka Streams Failed to rebalance error

2017-05-16 Thread Matthias J. Sax
Great! :) On 5/16/17 2:31 AM, Sameer Kumar wrote: > I see now that my Kafka cluster is very stable, and these errors dont come > now. > > -Sameer. > > On Fri, May 5, 2017 at 7:53 AM, Sameer Kumar wrote: > >> Yes, I have upgraded my cluster and client both to version

You're Invited: Streams Processing Meetup at LinkedIn on May 24

2017-05-16 Thread Ed Yakabosky
Hi Kafka & Samza communities, I wanted to send a reminder that the Streams team at LinkedIn invites you to attend a meetup here at Linkedin on Wednesday, May 24 from 6-9PM . You will learn about the interesting things

Re: Kafka Streams RocksDB permanent StateStoreSupplier seemingly never deletes data

2017-05-16 Thread Eno Thereska
0.10.2.1 is compatible with Kafka 0.10.1. Eno > On 16 May 2017, at 20:45, Vincent Bernardi wrote: > > The LOG files stay small. The SST files are growing but not in size, in > numbers. Old .sst files seem never written to anymore but are not deleted > and new ones appear

Re: Kafka Streams RocksDB permanent StateStoreSupplier seemingly never deletes data

2017-05-16 Thread Vincent Bernardi
The LOG files stay small. The SST files are growing but not in size, in numbers. Old .sst files seem never written to anymore but are not deleted and new ones appear regularly. I can certainly try streams 0.10.2.1 if it's compatible with Kafka 0.10.1. I have not checked the compatibility matrix

Re: `key.converter` per connector

2017-05-16 Thread BigData dev
Hi, key.converter and value.converter can be overridden at connector level. This has been supported from Kafka 0.10.1.0 For more info refer to this KIP https://cwiki.apache.org/confluence/display/KAFKA/KIP-75+-+Add+per-connector+Converters Thanks, Bharat On Tue, May 16, 2017 at 1:39 AM,

RE: processing JSON

2017-05-16 Thread Adaryl Wakefield
I think I might have asked that question wrong. Let's say the final destination of the data will be into a format that isn't JSON. Let's say we dump it into HBASE or even into a parquet file. When should we parse the JSON to get the values? Adaryl "Bob" Wakefield, MBA Principal Mass Street

Schema changes in Kafka and impact on topic

2017-05-16 Thread Mich Talebzadeh
Hi, >From time to time one may encounter schema changes on the source data streamed to Kafka through a topic. What will happen in schema changes and say a new column is added to topic. what would be the easiest way of being notified of this change? will the schema deploy a versioning etc ?

Re: Securing Kafka - Keystore and Truststore question

2017-05-16 Thread Raghav
Many thanks, Rajini. On Tue, May 16, 2017 at 8:43 AM, Rajini Sivaram wrote: > Hi Raghav, > > If your Kafka broker is configured with *ssl.client.auth=required,* your > customer's clients need to provide a keystore. In any case, they need a > truststore since your broker

Re: Kafka Streams stopped with errors, failed to reinitialize itself

2017-05-16 Thread Guozhang Wang
I will reply on that thread. On Tue, May 16, 2017 at 2:37 AM, Sameer Kumar wrote: > Hi Guozhang, > > The errors have gone away after migrating both my brokers and api to > 10.2.1. > But, regardng the error the specific theads moved from running to not > running state. >

Re: Kafka Streams RocksDB permanent StateStoreSupplier seemingly never deletes data

2017-05-16 Thread Eno Thereska
Thanks. Which RocksDb files are growing indefinitely, the LOG or SST ones? Also, any chance you could use the latest streams library 0.10.2.1 to check if problem still exists? Eno > On 16 May 2017, at 16:43, Vincent Bernardi wrote: > > Just tried setting compaction

Re: KafkaStreams reports RUNNING even though all StreamThreads has crashed

2017-05-16 Thread Eno Thereska
Hi Andreas, Thanks for reporting. This sounds like a bug, but could also be a semantic thing. Couple of questions: - which version of Kafka are you using? - what is the nature of the failure of the threads, e.g., how come they have all crashed? If all threads crash, was there an exception they

Re: Securing Kafka - Keystore and Truststore question

2017-05-16 Thread Rajini Sivaram
Hi Raghav, If your Kafka broker is configured with *ssl.client.auth=required,* your customer's clients need to provide a keystore. In any case, they need a truststore since your broker is using SSL. For the truststore, you can given them ca-cert, as you mentioned. Client keystore contains a

Re: Kafka Streams RocksDB permanent StateStoreSupplier seemingly never deletes data

2017-05-16 Thread Vincent Bernardi
Just tried setting compaction threads to 5, but I have the exact same problem: the rocksdb files get bigger and bigger, while my application never stores more than 200k K/V pairs. V. On Tue, May 16, 2017 at 5:22 PM, Vincent Bernardi wrote: > Hi Eno, > Thanks for your

KafkaStreams reports RUNNING even though all StreamThreads has crashed

2017-05-16 Thread Andreas Gabrielsson
Hi All, We recently implemented a health check for a Kafka Streams based application. The health check is simply checking the state of Kafka Streams by calling KafkaStreams.state(). It reports healthy if it’s not in PENDING_SHUTDOWN or NOT_RUNNING states. We truly appreciate having the

Data loss after a Kafka broker restart in a cluster

2017-05-16 Thread Fathima Amara
Hi all, I am using Kafka 2.11-0.10.0.1 and Zookeeper 3.4.8. I have a cluster of 4 servers(A,B,C,D) running one kafka broker on each of them and, one zookeeper server on server A. Data is initially produced from server A using a Kafka Producer and it goes through servers B,C,D being subjected to

Re: Data loss after a Kafka Broker restart scenario

2017-05-16 Thread Jun Rao
Hi, Fathima, There is a known data loss issue that's described in KIP-101 ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation). The issue happens rarely, but has been exposed in some of our system

Re: Kafka Streams RocksDB permanent StateStoreSupplier seemingly never deletes data

2017-05-16 Thread Vincent Bernardi
Hi Eno, Thanks for your answer. I tried sending a followup email when I realised I forgot to tell you the version number but it must have fallen through. I'm using 0.10.1.1 both for Kafka and for the streams library. Currently my application works on 4 partitions and only uses about 100% of one

Data loss after a Kafka Broker restart scenario

2017-05-16 Thread Fathima Amara
Hi all, I am using Kafka 2.11-0.10.0.1 and Zookeeper 3.4.8. I have a cluster of 4 servers(A,B,C,D) running one kafka broker on each of them and, one zookeeper server on server A. Data is initially produced from server A using a Kafka Producer and it goes through servers B,C,D being subjected

Re: Can state stores function as a caching layer for persistent storage

2017-05-16 Thread Eno Thereska
(it's preferred to create another email thread for a different topic to make it easier to look back) Yes, there could be room for optimizations, e.g., see this:

Re: Kafka Streams RocksDB permanent StateStoreSupplier seemingly never deletes data

2017-05-16 Thread Vincent Bernardi
I forgot to add that I'm using Kafka and Kafka-streams 10.1.1. V. On 2017-05-16 15:58 (+0200), Vincent Bernardi wrote: > Hi, > I'm running an experimental Kafka Stream Processor which accumulates lots > of data in a StateStoreSupplier during transform() and forwards lots

Re: Kafka Streams RocksDB permanent StateStoreSupplier seemingly never deletes data

2017-05-16 Thread Eno Thereska
Which version of Kafka are you using? It might be that RocksDb doesn't get enough resources to compact the data fast enough. If that's the case you can try increasing the number of background compaction threads for RocksDb through the RocksDbConfigSetter class (see

Re: Can state stores function as a caching layer for persistent storage

2017-05-16 Thread João Peixoto
Follow up doubt (let me know if a new question should be created). Do we always need a repartitioning topic? If I'm reading things correctly when we change the key of a record we need make sure the new key falls on the same partition that we are processing. This makes a lot of sense if after

Re: Securing Kafka - Keystore and Truststore question

2017-05-16 Thread Raghav
Hi Rajini This was very helpful. I have another questions on similar lines. We host Kafka Broker, and we also have our own private CA. We want our customers to setup their Kafka Clients (Producer and Consumer) using SSL using *ssl.client.auth=required*. Is there a way, we can generate

Kafka Streams RocksDB permanent StateStoreSupplier seemingly never deletes data

2017-05-16 Thread Vincent Bernardi
Hi, I'm running an experimental Kafka Stream Processor which accumulates lots of data in a StateStoreSupplier during transform() and forwards lots of data during punctuate (and deletes it form the StateStoreSupplier). I'm currently using a persistent StateStore, meaning that Kafka Streams provides

Kafka Consumer Attached to partition but not consuming messages

2017-05-16 Thread Abhimanyu Nagrath
https://issues.apache.org/jira/browse/KAFKA-5221 . Regards, Abhimanyu

Re: Data loss after a Kafka broker restart scenario.

2017-05-16 Thread Mathieu Fenniak
Hi Fathima, Setting "retries=0" on the producer means that an attempt to produce a message, if it encounters an error, will result in that message being lost. It's likely the producer will encounter intermittent errors when you kill one broker in the cluster. I'd suggest trying this test with a

Mitigating impact on connected producers and consumers when restarting broker node

2017-05-16 Thread Bernal, Marcos
Hello, We've got a Kafka 0.9.0.1 cluster, with 4 brokers, a few hundred topics, 20-50 consumers and 10-15 producers. Often we've got to do changes on the cluster config and we do rolling restarts, with controlled shutdowns. However, our consumers and producers see a big impact when: - The

Data loss after a Kafka broker restart scenario.

2017-05-16 Thread Fathima Amara
Hi all, I am using Kafka 2.11-0.10.0.1 and Zookeeper 3.4.8. I have a cluster of 4 servers(A,B,C,D) running one kafka broker on each of them and, one zookeeper server on server A. Data is initially produced from server A using a Kafka Producer and it goes through servers B,C,D being subjected

Re: is anyone able to create 1M or 10M or 100M or 1B partitions in a topic?

2017-05-16 Thread Ofir Manor
Each partition is a directory, whose name is topicName-partitionNumber In each directory, there are likely many files, based on how you configure the Kafka log rotation and retention (check log.roll.* parameters, for starter), plus some internal files (indexing etc) You can easily try it out

Re: is anyone able to create 1M or 10M or 100M or 1B partitions in a topic?

2017-05-16 Thread kant kodali
Got it! but do you mean directories or files? I thought partition = file and topic = directory If there are 1 partitions in a topic that means there are 1 files in one directory? On Tue, May 16, 2017 at 2:29 AM, Sameer Kumar wrote: > I don't think this would be

Re: Kafka Streams stopped with errors, failed to reinitialize itself

2017-05-16 Thread Sameer Kumar
Hi Guozhang, The errors have gone away after migrating both my brokers and api to 10.2.1. But, regardng the error the specific theads moved from running to not running state. -Sameer. On Tue, May 9, 2017 at 12:16 AM, Guozhang Wang wrote: > Hi Sameer, > > I looked at the

Re: Kafka Streams Failed to rebalance error

2017-05-16 Thread Sameer Kumar
I see now that my Kafka cluster is very stable, and these errors dont come now. -Sameer. On Fri, May 5, 2017 at 7:53 AM, Sameer Kumar wrote: > Yes, I have upgraded my cluster and client both to version 10.2.1 and > currently monitoring the situation. > Will report back

Re: is anyone able to create 1M or 10M or 100M or 1B partitions in a topic?

2017-05-16 Thread Sameer Kumar
I don't think this would be the right approach. from broker side, this would mean creating 1M/10M/100M/1B directories, this would be too much for the file system itself. For most cases, even some thousand partitions per node should be sufficient. For more details, please refer to

Re: is anyone able to create 1M or 10M or 100M or 1B partitions in a topic?

2017-05-16 Thread kant kodali
Forgot to mention: The question in this thread is for one node which has 8 CPU's 16GB RAM & 500GB hard disk space. On Tue, May 16, 2017 at 2:06 AM, kant kodali wrote: > Hi All, > > 1. I was wondering if anyone has seen or heard or able to create 1M or 10M > or 100M or 1B

is anyone able to create 1M or 10M or 100M or 1B partitions in a topic?

2017-05-16 Thread kant kodali
Hi All, 1. I was wondering if anyone has seen or heard or able to create 1M or 10M or 100M or 1B partitions in a topic? I understand lot of this depends on filesystem limitations (we are using ext4) and the OS limitations but I just would like to know what is the scale one had seen in production?

`key.converter` per connector

2017-05-16 Thread Nicolas Fouché
Hi, In distributed mode, can I set `key.converter` in a connector configuration. Some of my connectors use `org.apache.kafka.connect.storage.StringConverter` while others would need `io.confluent.connect.avro.AvroConverter`. So I wondered if `key.converter` could be overriden at the connector

Re: processing JSON

2017-05-16 Thread Affan Syed
Bob, serialization happens at both ends, and is entirely optional On Tue, May 16, 2017 at 8:43 AM, Adaryl Wakefield < adaryl.wakefi...@hotmail.com> wrote: > Should you serialize JSON as part of the producer or consumer? > > Adaryl "Bob" Wakefield, MBA > Principal > Mass Street Analytics, LLC >

Re: Reg: [VOTE] KIP 157 - Add consumer config options to streams reset tool

2017-05-16 Thread Eno Thereska
+1 thanks. Eno > On 16 May 2017, at 04:20, BigData dev wrote: > > Hi All, > Given the simple and non-controversial nature of the KIP, I would like to > start the voting process for KIP-157: Add consumer config options to > streams reset tool > >

Re: Kafka-streams process stopped processing messages

2017-05-16 Thread Eno Thereska
Hi Shimi, Could we start a new email thread on the slow booting to separate it from the initial thread (call it "slow boot" or something)? Thank you. Also, could you provide the logs for the booting part if possible, together with your streams config. Thanks Eno > On 15 May 2017, at 20:49,

Re: Kafka Connect CPU spikes bring down Kafka Connect workers

2017-05-16 Thread Ewen Cheslack-Postava
On Mon, May 15, 2017 at 2:06 PM, Phillip Mann wrote: > Currently, Kafka Connect experiences a spike in CPU usage which causes > Kafka Connect to crash. What kind of crash? Can you provide an error or stacktrace? > There is really no useful information from the logs to

Why cannot kakfka copy data to a new replica automatically when one of replica is down?

2017-05-16 Thread wuchang
I have a kafka clusters with 3 brokers. 192.168.80.82,192.168.80.146 and 192.168.80.110 This is the topic description named DruidAppColdStartMsg: [appuser@hz-192-168-80-146 kafka]$ bin/kafka-topics.sh --describe --topic DruidAppColdStartMsg --zookeeper 10.120.241.50:2181