[jira] [Commented] (KAFKA-3657) NewProducer NullPointerException on ProduceRequest
[ https://issues.apache.org/jira/browse/KAFKA-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271007#comment-15271007 ] Vamsi Subhash Achanta commented on KAFKA-3657: -- I haven't tested it against 0.9.x branch. This occurs on production intermittently and we are still on 0.8.2.1 > NewProducer NullPointerException on ProduceRequest > -- > > Key: KAFKA-3657 > URL: https://issues.apache.org/jira/browse/KAFKA-3657 > Project: Kafka > Issue Type: Bug > Components: network, producer >Affects Versions: 0.8.2.1 > Environment: linux 3.2.0 debian7 >Reporter: Vamsi Subhash Achanta >Assignee: Jun Rao > > The producer upon send.get() on the future appends to the accumulator the > record batches and the Sender.java (separate thread) flushes it to the > server. The produce request waits on the countDownLatch in the > FutureRecordMetadata: > public RecordMetadata get() throws InterruptedException, > ExecutionException { > this.result.await(); > In this case, the client thread is blocked for ever (as it is get() without > timeout) for the response and the response upon poll by the Sender returns an > attachment with the batch value as null. The batch is processed and the > request is errored out. The Sender catches a global level exception and then > goes ahead. As the accumulator is drained, the response will never be > returned and the producer client thread calling get() is blocked for ever on > the latch await call. > I checked at the server end but still haven't found the reason for null > batch. Any pointers on this? > ERROR [2016-05-01 21:00:09,256] [kafka-producer-network-thread |producer-app] > [Sender] message_id: group_id: : Uncaught error in kafka producer I/O thread: > ! java.lang.NullPointerException: null > ! at > org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:266) > ! at > org.apache.kafka.clients.producer.internals.Sender.handleResponse(Sender.java:236) > ! at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:196) > ! at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) > ! at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3657) NewProducer NullPointerException on ProduceRequest
Vamsi Subhash Achanta created KAFKA-3657: Summary: NewProducer NullPointerException on ProduceRequest Key: KAFKA-3657 URL: https://issues.apache.org/jira/browse/KAFKA-3657 Project: Kafka Issue Type: Bug Components: network, producer Affects Versions: 0.8.2.1 Environment: linux 3.2.0 debian7 Reporter: Vamsi Subhash Achanta Assignee: Jun Rao The producer upon send.get() on the future appends to the accumulator the record batches and the Sender.java (separate thread) flushes it to the server. The produce request waits on the countDownLatch in the FutureRecordMetadata: public RecordMetadata get() throws InterruptedException, ExecutionException { this.result.await(); In this case, the client thread is blocked for ever (as it is get() without timeout) for the response and the response upon poll by the Sender returns an attachment with the batch value as null. The batch is processed and the request is errored out. The Sender catches a global level exception and then goes ahead. As the accumulator is drained, the response will never be returned and the producer client thread calling get() is blocked for ever on the latch await call. I checked at the server end but still haven't found the reason for null batch. Any pointers on this? ERROR [2016-05-01 21:00:09,256] [kafka-producer-network-thread |producer-app] [Sender] message_id: group_id: : Uncaught error in kafka producer I/O thread: ! java.lang.NullPointerException: null ! at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:266) ! at org.apache.kafka.clients.producer.internals.Sender.handleResponse(Sender.java:236) ! at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:196) ! at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) ! at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3657) NewProducer NullPointerException on ProduceRequest
[ https://issues.apache.org/jira/browse/KAFKA-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vamsi Subhash Achanta updated KAFKA-3657: - Description: The producer upon send.get() on the future appends to the accumulator the record batches and the Sender.java (separate thread) flushes it to the server. The produce request waits on the countDownLatch in the FutureRecordMetadata: public RecordMetadata get() throws InterruptedException, ExecutionException { this.result.await(); In this case, the client thread is blocked for ever (as it is get() without timeout) for the response and the response upon poll by the Sender returns an attachment with the batch value as null. The batch is processed and the request is errored out. The Sender catches a global level exception and then goes ahead. As the accumulator is drained, the response will never be returned and the producer client thread calling get() is blocked for ever on the latch await call. I checked at the server end but still haven't found the reason for null batch. Any pointers on this? ERROR [2016-05-01 21:00:09,256] [kafka-producer-network-thread |producer-app] [Sender] message_id: group_id: : Uncaught error in kafka producer I/O thread: ! java.lang.NullPointerException: null ! at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:266) ! at org.apache.kafka.clients.producer.internals.Sender.handleResponse(Sender.java:236) ! at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:196) ! at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) ! at java.lang.Thread.run(Thread.java:745) was: The producer upon send.get() on the future appends to the accumulator the record batches and the Sender.java (separate thread) flushes it to the server. The produce request waits on the countDownLatch in the FutureRecordMetadata: public RecordMetadata get() throws InterruptedException, ExecutionException { this.result.await(); In this case, the client thread is blocked for ever (as it is get() without timeout) for the response and the response upon poll by the Sender returns an attachment with the batch value as null. The batch is processed and the request is errored out. The Sender catches a global level exception and then goes ahead. As the accumulator is drained, the response will never be returned and the producer client thread calling get() is blocked for ever on the latch await call. I checked at the server end but still haven't found the reason for null batch. Any pointers on this? ERROR [2016-05-01 21:00:09,256] [kafka-producer-network-thread |producer-app] [Sender] message_id: group_id: : Uncaught error in kafka producer I/O thread: ! java.lang.NullPointerException: null ! at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:266) ! at org.apache.kafka.clients.producer.internals.Sender.handleResponse(Sender.java:236) ! at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:196) ! at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) ! at java.lang.Thread.run(Thread.java:745) > NewProducer NullPointerException on ProduceRequest > -- > > Key: KAFKA-3657 > URL: https://issues.apache.org/jira/browse/KAFKA-3657 > Project: Kafka > Issue Type: Bug > Components: network, producer >Affects Versions: 0.8.2.1 > Environment: linux 3.2.0 debian7 >Reporter: Vamsi Subhash Achanta >Assignee: Jun Rao > > The producer upon send.get() on the future appends to the accumulator the > record batches and the Sender.java (separate thread) flushes it to the > server. The produce request waits on the countDownLatch in the > FutureRecordMetadata: > public RecordMetadata get() throws InterruptedException, > ExecutionException { > this.result.await(); > In this case, the client thread is blocked for ever (as it is get() without > timeout) for the response and the response upon poll by the Sender returns an > attachment with the batch value as null. The batch is processed and the > request is errored out. The Sender catches a global level exception and then > goes ahead. As the accumulator is drained, the response will never be > returned and the producer client thread calling get() is blocked for ever on > the latch await call. > I checked at the server end but still haven't found the reason for null > batch. Any pointers on this? > ERROR [2016-05-01 21:00:09,256] [kafka-producer-network-thread |producer-app] > [Sender] message_id: group_id: : Uncaught error in kafka producer I/O thread: > ! java.lang.NullPointerException: null > ! at > org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:266) > ! at >
[jira] [Commented] (KAFKA-3359) Parallel log-recovery of un-flushed segments on startup
[ https://issues.apache.org/jira/browse/KAFKA-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199544#comment-15199544 ] Vamsi Subhash Achanta commented on KAFKA-3359: -- *bump* Can this be reviewed? > Parallel log-recovery of un-flushed segments on startup > --- > > Key: KAFKA-3359 > URL: https://issues.apache.org/jira/browse/KAFKA-3359 > Project: Kafka > Issue Type: Improvement > Components: log >Affects Versions: 0.8.2.2, 0.9.0.1 >Reporter: Vamsi Subhash Achanta >Assignee: Jay Kreps > Fix For: 0.10.0.0 > > > On startup, currently the log segments within a logDir are loaded > sequentially when there is a un-clean shutdown. This will take a lot of time > for the segments to be loaded as the logSegment.recover(..) is called for > every segment and for brokers which have many partitions, the time taken will > be very high (we have noticed ~40mins for 2k partitions). > https://github.com/apache/kafka/pull/1035 > This pull request will make the log-segment load parallel with two > configurable properties "log.recovery.threads" and > "log.recovery.max.interval.ms". > Logic: > 1. Have a threadpool defined of fixed length (log.recovery.threads) > 2. Submit the logSegment recovery as a job to the threadpool and add the > future returned to a job list > 3. Wait till all the jobs are done within req. time > (log.recovery.max.interval.ms - default set to Long.Max). > 4. If they are done and the futures are all null (meaning that the jobs are > successfully completed), it is considered done. > 5. If any of the recovery jobs failed, then it is logged and > LogRecoveryFailedException is thrown > 6. If the timeout is reached, LogRecoveryFailedException is thrown. > The logic is backward compatible with the current sequential implementation > as the default thread count is set to 1. > PS: I am new to Scala and the code might look Java-ish but I will be happy to > modify the code review changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3359) Parallel log-recovery of un-flushed segments on startup
[ https://issues.apache.org/jira/browse/KAFKA-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vamsi Subhash Achanta updated KAFKA-3359: - Priority: Major (was: Minor) > Parallel log-recovery of un-flushed segments on startup > --- > > Key: KAFKA-3359 > URL: https://issues.apache.org/jira/browse/KAFKA-3359 > Project: Kafka > Issue Type: Improvement > Components: log >Affects Versions: 0.8.2.2, 0.9.0.1 >Reporter: Vamsi Subhash Achanta >Assignee: Jay Kreps > Fix For: 0.10.0.0 > > > On startup, currently the log segments within a logDir are loaded > sequentially when there is a un-clean shutdown. This will take a lot of time > for the segments to be loaded as the logSegment.recover(..) is called for > every segment and for brokers which have many partitions, the time taken will > be very high (we have noticed ~40mins for 2k partitions). > https://github.com/apache/kafka/pull/1035 > This pull request will make the log-segment load parallel with two > configurable properties "log.recovery.threads" and > "log.recovery.max.interval.ms". > Logic: > 1. Have a threadpool defined of fixed length (log.recovery.threads) > 2. Submit the logSegment recovery as a job to the threadpool and add the > future returned to a job list > 3. Wait till all the jobs are done within req. time > (log.recovery.max.interval.ms - default set to Long.Max). > 4. If they are done and the futures are all null (meaning that the jobs are > successfully completed), it is considered done. > 5. If any of the recovery jobs failed, then it is logged and > LogRecoveryFailedException is thrown > 6. If the timeout is reached, LogRecoveryFailedException is thrown. > The logic is backward compatible with the current sequential implementation > as the default thread count is set to 1. > PS: I am new to Scala and the code might look Java-ish but I will be happy to > modify the code review changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3359) Parallel log-recovery of un-flushed segments on startup
[ https://issues.apache.org/jira/browse/KAFKA-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vamsi Subhash Achanta updated KAFKA-3359: - Reviewer: Grant Henke Fix Version/s: 0.10.0.0 Status: Patch Available (was: Open) https://github.com/apache/kafka/pull/1035 > Parallel log-recovery of un-flushed segments on startup > --- > > Key: KAFKA-3359 > URL: https://issues.apache.org/jira/browse/KAFKA-3359 > Project: Kafka > Issue Type: Improvement > Components: log >Affects Versions: 0.9.0.1, 0.8.2.2 >Reporter: Vamsi Subhash Achanta >Assignee: Jay Kreps >Priority: Minor > Fix For: 0.10.0.0 > > > On startup, currently the log segments within a logDir are loaded > sequentially when there is a un-clean shutdown. This will take a lot of time > for the segments to be loaded as the logSegment.recover(..) is called for > every segment and for brokers which have many partitions, the time taken will > be very high (we have noticed ~40mins for 2k partitions). > https://github.com/apache/kafka/pull/1035 > This pull request will make the log-segment load parallel with two > configurable properties "log.recovery.threads" and > "log.recovery.max.interval.ms". > Logic: > 1. Have a threadpool defined of fixed length (log.recovery.threads) > 2. Submit the logSegment recovery as a job to the threadpool and add the > future returned to a job list > 3. Wait till all the jobs are done within req. time > (log.recovery.max.interval.ms - default set to Long.Max). > 4. If they are done and the futures are all null (meaning that the jobs are > successfully completed), it is considered done. > 5. If any of the recovery jobs failed, then it is logged and > LogRecoveryFailedException is thrown > 6. If the timeout is reached, LogRecoveryFailedException is thrown. > The logic is backward compatible with the current sequential implementation > as the default thread count is set to 1. > PS: I am new to Scala and the code might look Java-ish but I will be happy to > modify the code review changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3359) Parallel log-recovery of un-flushed segments on startup
[ https://issues.apache.org/jira/browse/KAFKA-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vamsi Subhash Achanta updated KAFKA-3359: - Issue Type: Improvement (was: Bug) > Parallel log-recovery of un-flushed segments on startup > --- > > Key: KAFKA-3359 > URL: https://issues.apache.org/jira/browse/KAFKA-3359 > Project: Kafka > Issue Type: Improvement > Components: log >Affects Versions: 0.8.2.2, 0.9.0.1 >Reporter: Vamsi Subhash Achanta >Assignee: Jay Kreps >Priority: Minor > > On startup, currently the log segments within a logDir are loaded > sequentially when there is a un-clean shutdown. This will take a lot of time > for the segments to be loaded as the logSegment.recover(..) is called for > every segment and for brokers which have many partitions, the time taken will > be very high (we have noticed ~40mins for 2k partitions). > https://github.com/apache/kafka/pull/1035 > This pull request will make the log-segment load parallel with two > configurable properties "log.recovery.threads" and > "log.recovery.max.interval.ms". > Logic: > 1. Have a threadpool defined of fixed length (log.recovery.threads) > 2. Submit the logSegment recovery as a job to the threadpool and add the > future returned to a job list > 3. Wait till all the jobs are done within req. time > (log.recovery.max.interval.ms - default set to Long.Max). > 4. If they are done and the futures are all null (meaning that the jobs are > successfully completed), it is considered done. > 5. If any of the recovery jobs failed, then it is logged and > LogRecoveryFailedException is thrown > 6. If the timeout is reached, LogRecoveryFailedException is thrown. > The logic is backward compatible with the current sequential implementation > as the default thread count is set to 1. > PS: I am new to Scala and the code might look Java-ish but I will be happy to > modify the code review changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3359) Parallel log-recovery of un-flushed segments on startup
Vamsi Subhash Achanta created KAFKA-3359: Summary: Parallel log-recovery of un-flushed segments on startup Key: KAFKA-3359 URL: https://issues.apache.org/jira/browse/KAFKA-3359 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.9.0.1, 0.8.2.2 Reporter: Vamsi Subhash Achanta Assignee: Jay Kreps Priority: Minor On startup, currently the log segments within a logDir are loaded sequentially when there is a un-clean shutdown. This will take a lot of time for the segments to be loaded as the logSegment.recover(..) is called for every segment and for brokers which have many partitions, the time taken will be very high (we have noticed ~40mins for 2k partitions). https://github.com/apache/kafka/pull/1035 This pull request will make the log-segment load parallel with two configurable properties "log.recovery.threads" and "log.recovery.max.interval.ms". Logic: 1. Have a threadpool defined of fixed length (log.recovery.threads) 2. Submit the logSegment recovery as a job to the threadpool and add the future returned to a job list 3. Wait till all the jobs are done within req. time (log.recovery.max.interval.ms - default set to Long.Max). 4. If they are done and the futures are all null (meaning that the jobs are successfully completed), it is considered done. 5. If any of the recovery jobs failed, then it is logged and LogRecoveryFailedException is thrown 6. If the timeout is reached, LogRecoveryFailedException is thrown. The logic is backward compatible with the current sequential implementation as the default thread count is set to 1. PS: I am new to Scala and the code might look Java-ish but I will be happy to modify the code review changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2163) Offsets manager cache should prevent stale-offset-cleanup while an offset load is in progress; otherwise we can lose consumer offsets
[ https://issues.apache.org/jira/browse/KAFKA-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185020#comment-15185020 ] Vamsi Subhash Achanta commented on KAFKA-2163: -- Can this be merged into 0.8.2.2 branch? > Offsets manager cache should prevent stale-offset-cleanup while an offset > load is in progress; otherwise we can lose consumer offsets > - > > Key: KAFKA-2163 > URL: https://issues.apache.org/jira/browse/KAFKA-2163 > Project: Kafka > Issue Type: Bug >Reporter: Joel Koshy >Assignee: Joel Koshy > Fix For: 0.9.0.0 > > Attachments: KAFKA-2163.patch > > > When leadership of an offsets partition moves, the new leader loads offsets > from that partition into the offset manager cache. > Independently, the offset manager has a periodic cleanup task for stale > offsets that removes old offsets from the cache and appends tombstones for > those. If the partition happens to contain much older offsets (earlier in the > log) and inserts those into the cache; the cleanup task may run and see those > offsets (which it deems to be stale) and proceeds to remove from the cache > and append a tombstone to the end of the log. The tombstone will override the > true latest offset and a subsequent offset fetch request will return no > offset. > We just need to prevent the cleanup task from running during an offset load. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2967) Move Kafka documentation to ReStructuredText
[ https://issues.apache.org/jira/browse/KAFKA-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177342#comment-15177342 ] Vamsi Subhash Achanta commented on KAFKA-2967: -- Emacs org mode works very well for documentation: http://orgmode.org/ It can generate Latex output as well and developers feel like heaven writing documentation in it. It can be used by people who don't use Emacs as well. > Move Kafka documentation to ReStructuredText > > > Key: KAFKA-2967 > URL: https://issues.apache.org/jira/browse/KAFKA-2967 > Project: Kafka > Issue Type: Bug >Reporter: Gwen Shapira >Assignee: Gwen Shapira > > Storing documentation as HTML is kind of BS :) > * Formatting is a pain, and making it look good is even worse > * Its just HTML, can't generate PDFs > * Reading and editting is painful > * Validating changes is hard because our formatting relies on all kinds of > Apache Server features. > I suggest: > * Move to RST > * Generate HTML and PDF during build using Sphinx plugin for Gradle. > Lots of Apache projects are doing this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1229) Reload broker config without a restart
[ https://issues.apache.org/jira/browse/KAFKA-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557737#comment-14557737 ] Vamsi Subhash Achanta commented on KAFKA-1229: -- Can this be revisited? At-least the zookeeper.connect string? Reload broker config without a restart -- Key: KAFKA-1229 URL: https://issues.apache.org/jira/browse/KAFKA-1229 Project: Kafka Issue Type: Wish Components: config Affects Versions: 0.8.0 Reporter: Carlo Cabanilla Priority: Minor In order to minimize client disruption, ideally you'd be able to reload broker config without having to restart the server. On *nix system the convention is to have the process reread its configuration if it receives a SIGHUP signal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1977) Make logEndOffset available in the Zookeeper consumer
[ https://issues.apache.org/jira/browse/KAFKA-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549252#comment-14549252 ] Vamsi Subhash Achanta commented on KAFKA-1977: -- Hi Will, Even I have applied this patch to 0.8.2.2. All the tests passed and I am now using it on production. It seems to give correct values for the last 8 hours. Thanks for the patch. Make logEndOffset available in the Zookeeper consumer - Key: KAFKA-1977 URL: https://issues.apache.org/jira/browse/KAFKA-1977 Project: Kafka Issue Type: Improvement Components: core Reporter: Will Funnell Priority: Minor Attachments: Make_logEndOffset_available_in_the_Zookeeper_consumer.patch The requirement is to create a snapshot from the Kafka topic but NOT do continual reads after that point. For example you might be creating a backup of the data to a file. In order to achieve that, a recommended solution by Joel Koshy and Jay Kreps was to expose the high watermark, as maxEndOffset, from the FetchResponse object through to each MessageAndMetadata object in order to be aware when the consumer has reached the end of each partition. The submitted patch achieves this by adding the maxEndOffset to the PartitionTopicInfo, which is updated when a new message arrives in the ConsumerFetcherThread and then exposed in MessageAndMetadata. See here for discussion: http://search-hadoop.com/m/4TaT4TpJy71 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1977) Make logEndOffset available in the Zookeeper consumer
[ https://issues.apache.org/jira/browse/KAFKA-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549252#comment-14549252 ] Vamsi Subhash Achanta edited comment on KAFKA-1977 at 5/18/15 9:21 PM: --- Hi Will, Even I have applied this patch to 0.8.2.1. All the tests passed and I am now using it on production. It seems to give correct values for all consumers. Thanks for the patch. was (Author: vamsi360): Hi Will, Even I have applied this patch to 0.8.2.2. All the tests passed and I am now using it on production. It seems to give correct values for the last 8 hours. Thanks for the patch. Make logEndOffset available in the Zookeeper consumer - Key: KAFKA-1977 URL: https://issues.apache.org/jira/browse/KAFKA-1977 Project: Kafka Issue Type: Improvement Components: core Reporter: Will Funnell Priority: Minor Attachments: Make_logEndOffset_available_in_the_Zookeeper_consumer.patch The requirement is to create a snapshot from the Kafka topic but NOT do continual reads after that point. For example you might be creating a backup of the data to a file. In order to achieve that, a recommended solution by Joel Koshy and Jay Kreps was to expose the high watermark, as maxEndOffset, from the FetchResponse object through to each MessageAndMetadata object in order to be aware when the consumer has reached the end of each partition. The submitted patch achieves this by adding the maxEndOffset to the PartitionTopicInfo, which is updated when a new message arrives in the ConsumerFetcherThread and then exposed in MessageAndMetadata. See here for discussion: http://search-hadoop.com/m/4TaT4TpJy71 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1977) Make logEndOffset available in the Zookeeper consumer
[ https://issues.apache.org/jira/browse/KAFKA-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547606#comment-14547606 ] Vamsi Subhash Achanta commented on KAFKA-1977: -- can this be merged in the old consumer code in 0.8.2.x branch? We also have the same kind of requirement. Make logEndOffset available in the Zookeeper consumer - Key: KAFKA-1977 URL: https://issues.apache.org/jira/browse/KAFKA-1977 Project: Kafka Issue Type: Improvement Components: core Reporter: Will Funnell Priority: Minor Attachments: Make_logEndOffset_available_in_the_Zookeeper_consumer.patch The requirement is to create a snapshot from the Kafka topic but NOT do continual reads after that point. For example you might be creating a backup of the data to a file. In order to achieve that, a recommended solution by Joel Koshy and Jay Kreps was to expose the high watermark, as maxEndOffset, from the FetchResponse object through to each MessageAndMetadata object in order to be aware when the consumer has reached the end of each partition. The submitted patch achieves this by adding the maxEndOffset to the PartitionTopicInfo, which is updated when a new message arrives in the ConsumerFetcherThread and then exposed in MessageAndMetadata. See here for discussion: http://search-hadoop.com/m/4TaT4TpJy71 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2134) Producer blocked on metric publish
[ https://issues.apache.org/jira/browse/KAFKA-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vamsi Subhash Achanta updated KAFKA-2134: - Priority: Blocker (was: Major) Producer blocked on metric publish -- Key: KAFKA-2134 URL: https://issues.apache.org/jira/browse/KAFKA-2134 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2.1 Environment: debian7, java8 Reporter: Vamsi Subhash Achanta Assignee: Jun Rao Priority: Blocker Hi, We have a REST api to publish to a topic. Yesterday, we started noticing that the producer is not able to produce messages at a good rate and the CLOSE_WAITs of our producer REST app are very high. All the producer REST requests are hence timing out. When we took the thread dump and analysed it, we noticed that the threads are getting blocked on JmxReporter metricChange. Here is the attached stack trace. dw-70 - POST /queues/queue_1/messages #70 prio=5 os_prio=0 tid=0x7f043c8bd000 nid=0x54cf waiting for monitor entry [0x7f04363c7000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.kafka.common.metrics.JmxReporter.metricChange(JmxReporter.java:76) - waiting to lock 0x0005c1823860 (a java.lang.Object) at org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:182) - locked 0x0007a5e526c8 (a org.apache.kafka.common.metrics.Metrics) at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:165) - locked 0x0007a5e526e8 (a org.apache.kafka.common.metrics.Sensor) When I looked at the code of metricChange method, it uses a synchronised block on an object resource and it seems that it is held by another. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2134) Producer blocked on metric publish
[ https://issues.apache.org/jira/browse/KAFKA-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509083#comment-14509083 ] Vamsi Subhash Achanta commented on KAFKA-2134: -- We noticed this intermittently. What could be the problem? Producer blocked on metric publish -- Key: KAFKA-2134 URL: https://issues.apache.org/jira/browse/KAFKA-2134 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2.1 Environment: debian7, java8 Reporter: Vamsi Subhash Achanta Assignee: Jun Rao Priority: Blocker Hi, We have a REST api to publish to a topic. Yesterday, we started noticing that the producer is not able to produce messages at a good rate and the CLOSE_WAITs of our producer REST app are very high. All the producer REST requests are hence timing out. When we took the thread dump and analysed it, we noticed that the threads are getting blocked on JmxReporter metricChange. Here is the attached stack trace. dw-70 - POST /queues/queue_1/messages #70 prio=5 os_prio=0 tid=0x7f043c8bd000 nid=0x54cf waiting for monitor entry [0x7f04363c7000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.kafka.common.metrics.JmxReporter.metricChange(JmxReporter.java:76) - waiting to lock 0x0005c1823860 (a java.lang.Object) at org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:182) - locked 0x0007a5e526c8 (a org.apache.kafka.common.metrics.Metrics) at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:165) - locked 0x0007a5e526e8 (a org.apache.kafka.common.metrics.Sensor) When I looked at the code of metricChange method, it uses a synchronised block on an object resource and it seems that it is held by another. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2134) Producer blocked on metric publish
[ https://issues.apache.org/jira/browse/KAFKA-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vamsi Subhash Achanta updated KAFKA-2134: - Description: Hi, We have a REST api to publish to a topic. Yesterday, we started noticing that the producer is not able to produce messages at a good rate and the CLOSE_WAITs of our producer REST app are very high. All the producer REST requests are hence timing out. When we took the thread dump and analysed it, we noticed that the threads are getting blocked on JmxReporter metricChange. Here is the attached stack trace. dw-70 - POST /queues/queue_1/messages #70 prio=5 os_prio=0 tid=0x7f043c8bd000 nid=0x54cf waiting for monitor entry [0x7f04363c7000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.kafka.common.metrics.JmxReporter.metricChange(JmxReporter.java:76) - waiting to lock 0x0005c1823860 (a java.lang.Object) at org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:182) - locked 0x0007a5e526c8 (a org.apache.kafka.common.metrics.Metrics) at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:165) - locked 0x0007a5e526e8 (a org.apache.kafka.common.metrics.Sensor) When I looked at the code of metricChange method, it uses a synchronised block on an object resource and it seems that it is held by another. was: Hi, We have a REST api to publish to a topic. Yesterday, we started noticing that the producer is not able to produce messages at a good rate and the CLOSE_WAITs of our producer REST app are very high. All the producer REST requests are hence timing out. When we took the thread dump and analysed it, we noticed that the threads are getting blocked on JmxReporter metricChange. Here is the attached stack trace. dw-70 - POST /queues/ekl_bigfoot_marvin_production_1/messages #70 prio=5 os_prio=0 tid=0x7f043c8bd000 nid=0x54cf waiting for monitor entry [0x7f04363c7000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.kafka.common.metrics.JmxReporter.metricChange(JmxReporter.java:76) - waiting to lock 0x0005c1823860 (a java.lang.Object) at org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:182) - locked 0x0007a5e526c8 (a org.apache.kafka.common.metrics.Metrics) at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:165) - locked 0x0007a5e526e8 (a org.apache.kafka.common.metrics.Sensor) When I looked at the code of metricChange method, it uses a synchronised block on an object resource and it seems that it is held by another. Producer blocked on metric publish -- Key: KAFKA-2134 URL: https://issues.apache.org/jira/browse/KAFKA-2134 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2.1 Environment: debian7, java8 Reporter: Vamsi Subhash Achanta Assignee: Jun Rao Hi, We have a REST api to publish to a topic. Yesterday, we started noticing that the producer is not able to produce messages at a good rate and the CLOSE_WAITs of our producer REST app are very high. All the producer REST requests are hence timing out. When we took the thread dump and analysed it, we noticed that the threads are getting blocked on JmxReporter metricChange. Here is the attached stack trace. dw-70 - POST /queues/queue_1/messages #70 prio=5 os_prio=0 tid=0x7f043c8bd000 nid=0x54cf waiting for monitor entry [0x7f04363c7000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.kafka.common.metrics.JmxReporter.metricChange(JmxReporter.java:76) - waiting to lock 0x0005c1823860 (a java.lang.Object) at org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:182) - locked 0x0007a5e526c8 (a org.apache.kafka.common.metrics.Metrics) at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:165) - locked 0x0007a5e526e8 (a org.apache.kafka.common.metrics.Sensor) When I looked at the code of metricChange method, it uses a synchronised block on an object resource and it seems that it is held by another. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1791) Corrupt index after safe shutdown and restart
[ https://issues.apache.org/jira/browse/KAFKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vamsi Subhash Achanta updated KAFKA-1791: - Attachment: 0233.log Attached the log file. Corrupt index after safe shutdown and restart - Key: KAFKA-1791 URL: https://issues.apache.org/jira/browse/KAFKA-1791 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1 Environment: Debian6 with Sun-Java6 Reporter: Vamsi Subhash Achanta Priority: Critical Attachments: 0233.index, 0233.log We have 3 kafka brokers - all VMs. One of the broker was stopped for around 30 minutes to fix a problem with the bare metal. Upon restart, after some time, the broker went out of file-descriptors (FDs) and started throwing errors. Upon restart, it started throwing this corrupted index exceptions. I followed the other JIRAs related to corrupted indices but the solutions mentioned there (deleting the index and restart) didn't work - the index gets created again. The other JIRAs solution of deleting those indexes which got wrongly compacted ( 10MB) didn't work either. What is the error? How can I fix this and bring back the broker? Thanks. INFO [2014-11-21 02:57:17,510] [main][] kafka.log.LogManager - Found clean shutdown file. Skipping recovery for all logs in data directory '/var/lib/fk-3p-kafka/logs' INFO [2014-11-21 02:57:17,510] [main][] kafka.log.LogManager - Loading log 'kf.production.b2b.return_order.status-25' FATAL [2014-11-21 02:57:17,533] [main][] kafka.server.KafkaServerStartable - Fatal error during KafkaServerStable startup. Prepare to shutdown java.lang.IllegalArgumentException: requirement failed: Corrupt index found, index file (/var/lib/fk-3p-kafka/logs/kf.production.b2b.return_order.status-25/0233.index) has non-zero size but the last offset is 233 and the base offset is 233 at scala.Predef$.require(Predef.scala:145) at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352) at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:159) at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:158) at scala.collection.Iterator$class.foreach(Iterator.scala:631) at scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:474) at scala.collection.IterableLike$class.foreach(IterableLike.scala:79) at scala.collection.JavaConversions$JCollectionWrapper.foreach(JavaConversions.scala:495) at kafka.log.Log.loadSegments(Log.scala:158) at kafka.log.Log.init(Log.scala:64) at kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$4.apply(LogManager.scala:118) at kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$4.apply(LogManager.scala:113) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:34) at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:113) at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:32) at kafka.log.LogManager.loadLogs(LogManager.scala:105) at kafka.log.LogManager.init(LogManager.scala:57) at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275) at kafka.server.KafkaServer.startup(KafkaServer.scala:72) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) INFO [2014-11-21 02:57:17,534] [main][] kafka.server.KafkaServer - [Kafka Server 2], shutting down INFO [2014-11-21 02:57:17,538] [main][] kafka.server.KafkaServer - [Kafka Server 2], shut down completed INFO [2014-11-21 02:57:17,539] [Thread-2][] kafka.server.KafkaServer - [Kafka Server 2], shutting down -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1791) Corrupt index after safe shutdown and restart
[ https://issues.apache.org/jira/browse/KAFKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vamsi Subhash Achanta updated KAFKA-1791: - Attachment: 0233.index This was over a period of time when the exception is thrown again and again. The file size when copied increased to 10MB. Was it compressed? Corrupt index after safe shutdown and restart - Key: KAFKA-1791 URL: https://issues.apache.org/jira/browse/KAFKA-1791 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1 Environment: Debian6 with Sun-Java6 Reporter: Vamsi Subhash Achanta Priority: Critical Attachments: 0233.index We have 3 kafka brokers - all VMs. One of the broker was stopped for around 30 minutes to fix a problem with the bare metal. Upon restart, after some time, the broker went out of file-descriptors (FDs) and started throwing errors. Upon restart, it started throwing this corrupted index exceptions. I followed the other JIRAs related to corrupted indices but the solutions mentioned there (deleting the index and restart) didn't work - the index gets created again. The other JIRAs solution of deleting those indexes which got wrongly compacted ( 10MB) didn't work either. What is the error? How can I fix this and bring back the broker? Thanks. INFO [2014-11-21 02:57:17,510] [main][] kafka.log.LogManager - Found clean shutdown file. Skipping recovery for all logs in data directory '/var/lib/fk-3p-kafka/logs' INFO [2014-11-21 02:57:17,510] [main][] kafka.log.LogManager - Loading log 'kf.production.b2b.return_order.status-25' FATAL [2014-11-21 02:57:17,533] [main][] kafka.server.KafkaServerStartable - Fatal error during KafkaServerStable startup. Prepare to shutdown java.lang.IllegalArgumentException: requirement failed: Corrupt index found, index file (/var/lib/fk-3p-kafka/logs/kf.production.b2b.return_order.status-25/0233.index) has non-zero size but the last offset is 233 and the base offset is 233 at scala.Predef$.require(Predef.scala:145) at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352) at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:159) at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:158) at scala.collection.Iterator$class.foreach(Iterator.scala:631) at scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:474) at scala.collection.IterableLike$class.foreach(IterableLike.scala:79) at scala.collection.JavaConversions$JCollectionWrapper.foreach(JavaConversions.scala:495) at kafka.log.Log.loadSegments(Log.scala:158) at kafka.log.Log.init(Log.scala:64) at kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$4.apply(LogManager.scala:118) at kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$4.apply(LogManager.scala:113) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:34) at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:113) at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:32) at kafka.log.LogManager.loadLogs(LogManager.scala:105) at kafka.log.LogManager.init(LogManager.scala:57) at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275) at kafka.server.KafkaServer.startup(KafkaServer.scala:72) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) INFO [2014-11-21 02:57:17,534] [main][] kafka.server.KafkaServer - [Kafka Server 2], shutting down INFO [2014-11-21 02:57:17,538] [main][] kafka.server.KafkaServer - [Kafka Server 2], shut down completed INFO [2014-11-21 02:57:17,539] [Thread-2][] kafka.server.KafkaServer - [Kafka Server 2], shutting down -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1791) Corrupt index after safe shutdown and restart
[ https://issues.apache.org/jira/browse/KAFKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221851#comment-14221851 ] Vamsi Subhash Achanta commented on KAFKA-1791: -- Attached the file. Corrupt index after safe shutdown and restart - Key: KAFKA-1791 URL: https://issues.apache.org/jira/browse/KAFKA-1791 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1 Environment: Debian6 with Sun-Java6 Reporter: Vamsi Subhash Achanta Priority: Critical Attachments: 0233.index We have 3 kafka brokers - all VMs. One of the broker was stopped for around 30 minutes to fix a problem with the bare metal. Upon restart, after some time, the broker went out of file-descriptors (FDs) and started throwing errors. Upon restart, it started throwing this corrupted index exceptions. I followed the other JIRAs related to corrupted indices but the solutions mentioned there (deleting the index and restart) didn't work - the index gets created again. The other JIRAs solution of deleting those indexes which got wrongly compacted ( 10MB) didn't work either. What is the error? How can I fix this and bring back the broker? Thanks. INFO [2014-11-21 02:57:17,510] [main][] kafka.log.LogManager - Found clean shutdown file. Skipping recovery for all logs in data directory '/var/lib/fk-3p-kafka/logs' INFO [2014-11-21 02:57:17,510] [main][] kafka.log.LogManager - Loading log 'kf.production.b2b.return_order.status-25' FATAL [2014-11-21 02:57:17,533] [main][] kafka.server.KafkaServerStartable - Fatal error during KafkaServerStable startup. Prepare to shutdown java.lang.IllegalArgumentException: requirement failed: Corrupt index found, index file (/var/lib/fk-3p-kafka/logs/kf.production.b2b.return_order.status-25/0233.index) has non-zero size but the last offset is 233 and the base offset is 233 at scala.Predef$.require(Predef.scala:145) at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352) at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:159) at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:158) at scala.collection.Iterator$class.foreach(Iterator.scala:631) at scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:474) at scala.collection.IterableLike$class.foreach(IterableLike.scala:79) at scala.collection.JavaConversions$JCollectionWrapper.foreach(JavaConversions.scala:495) at kafka.log.Log.loadSegments(Log.scala:158) at kafka.log.Log.init(Log.scala:64) at kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$4.apply(LogManager.scala:118) at kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$4.apply(LogManager.scala:113) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:34) at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:113) at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:32) at kafka.log.LogManager.loadLogs(LogManager.scala:105) at kafka.log.LogManager.init(LogManager.scala:57) at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275) at kafka.server.KafkaServer.startup(KafkaServer.scala:72) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) INFO [2014-11-21 02:57:17,534] [main][] kafka.server.KafkaServer - [Kafka Server 2], shutting down INFO [2014-11-21 02:57:17,538] [main][] kafka.server.KafkaServer - [Kafka Server 2], shut down completed INFO [2014-11-21 02:57:17,539] [Thread-2][] kafka.server.KafkaServer - [Kafka Server 2], shutting down -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-1791) Corrupt index after safe shutdown and restart
Vamsi Subhash Achanta created KAFKA-1791: Summary: Corrupt index after safe shutdown and restart Key: KAFKA-1791 URL: https://issues.apache.org/jira/browse/KAFKA-1791 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1 Environment: Debian6 with Sun-Java6 Reporter: Vamsi Subhash Achanta Priority: Critical We have 3 kafka brokers - all VMs. One of the broker was stopped for around 30 minutes to fix a problem with the bare metal. Upon restart, after some time, the broker went out of file-descriptors (FDs) and started throwing errors. Upon restart, it started throwing this corrupted index exceptions. I followed the other JIRAs related to corrupted indices but the solutions mentioned there (deleting the index and restart) didn't work - the index gets created again. The other JIRAs solution of deleting those indexes which got wrongly compacted ( 10MB) didn't work either. What is the error? How can I fix this and bring back the broker? Thanks. INFO [2014-11-21 02:57:17,510] [main][] kafka.log.LogManager - Found clean shutdown file. Skipping recovery for all logs in data directory '/var/lib/fk-3p-kafka/logs' INFO [2014-11-21 02:57:17,510] [main][] kafka.log.LogManager - Loading log 'kf.production.b2b.return_order.status-25' FATAL [2014-11-21 02:57:17,533] [main][] kafka.server.KafkaServerStartable - Fatal error during KafkaServerStable startup. Prepare to shutdown java.lang.IllegalArgumentException: requirement failed: Corrupt index found, index file (/var/lib/fk-3p-kafka/logs/kf.production.b2b.return_order.status-25/0233.index) has non-zero size but the last offset is 233 and the base offset is 233 at scala.Predef$.require(Predef.scala:145) at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352) at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:159) at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:158) at scala.collection.Iterator$class.foreach(Iterator.scala:631) at scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:474) at scala.collection.IterableLike$class.foreach(IterableLike.scala:79) at scala.collection.JavaConversions$JCollectionWrapper.foreach(JavaConversions.scala:495) at kafka.log.Log.loadSegments(Log.scala:158) at kafka.log.Log.init(Log.scala:64) at kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$4.apply(LogManager.scala:118) at kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$4.apply(LogManager.scala:113) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:34) at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:113) at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:32) at kafka.log.LogManager.loadLogs(LogManager.scala:105) at kafka.log.LogManager.init(LogManager.scala:57) at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275) at kafka.server.KafkaServer.startup(KafkaServer.scala:72) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) INFO [2014-11-21 02:57:17,534] [main][] kafka.server.KafkaServer - [Kafka Server 2], shutting down INFO [2014-11-21 02:57:17,538] [main][] kafka.server.KafkaServer - [Kafka Server 2], shut down completed INFO [2014-11-21 02:57:17,539] [Thread-2][] kafka.server.KafkaServer - [Kafka Server 2], shutting down -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1791) Corrupt index after safe shutdown and restart
[ https://issues.apache.org/jira/browse/KAFKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220499#comment-14220499 ] Vamsi Subhash Achanta commented on KAFKA-1791: -- Its around 40 KB. Corrupt index after safe shutdown and restart - Key: KAFKA-1791 URL: https://issues.apache.org/jira/browse/KAFKA-1791 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1 Environment: Debian6 with Sun-Java6 Reporter: Vamsi Subhash Achanta Priority: Critical We have 3 kafka brokers - all VMs. One of the broker was stopped for around 30 minutes to fix a problem with the bare metal. Upon restart, after some time, the broker went out of file-descriptors (FDs) and started throwing errors. Upon restart, it started throwing this corrupted index exceptions. I followed the other JIRAs related to corrupted indices but the solutions mentioned there (deleting the index and restart) didn't work - the index gets created again. The other JIRAs solution of deleting those indexes which got wrongly compacted ( 10MB) didn't work either. What is the error? How can I fix this and bring back the broker? Thanks. INFO [2014-11-21 02:57:17,510] [main][] kafka.log.LogManager - Found clean shutdown file. Skipping recovery for all logs in data directory '/var/lib/fk-3p-kafka/logs' INFO [2014-11-21 02:57:17,510] [main][] kafka.log.LogManager - Loading log 'kf.production.b2b.return_order.status-25' FATAL [2014-11-21 02:57:17,533] [main][] kafka.server.KafkaServerStartable - Fatal error during KafkaServerStable startup. Prepare to shutdown java.lang.IllegalArgumentException: requirement failed: Corrupt index found, index file (/var/lib/fk-3p-kafka/logs/kf.production.b2b.return_order.status-25/0233.index) has non-zero size but the last offset is 233 and the base offset is 233 at scala.Predef$.require(Predef.scala:145) at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352) at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:159) at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:158) at scala.collection.Iterator$class.foreach(Iterator.scala:631) at scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:474) at scala.collection.IterableLike$class.foreach(IterableLike.scala:79) at scala.collection.JavaConversions$JCollectionWrapper.foreach(JavaConversions.scala:495) at kafka.log.Log.loadSegments(Log.scala:158) at kafka.log.Log.init(Log.scala:64) at kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$4.apply(LogManager.scala:118) at kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$4.apply(LogManager.scala:113) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:34) at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:113) at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:32) at kafka.log.LogManager.loadLogs(LogManager.scala:105) at kafka.log.LogManager.init(LogManager.scala:57) at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275) at kafka.server.KafkaServer.startup(KafkaServer.scala:72) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) INFO [2014-11-21 02:57:17,534] [main][] kafka.server.KafkaServer - [Kafka Server 2], shutting down INFO [2014-11-21 02:57:17,538] [main][] kafka.server.KafkaServer - [Kafka Server 2], shut down completed INFO [2014-11-21 02:57:17,539] [Thread-2][] kafka.server.KafkaServer - [Kafka Server 2], shutting down -- This message was sent by Atlassian JIRA (v6.3.4#6332)