Ewen Cheslack-Postava created KAFKA-1721:
--------------------------------------------

             Summary: Snappy compressor is not thread safe
                 Key: KAFKA-1721
                 URL: https://issues.apache.org/jira/browse/KAFKA-1721
             Project: Kafka
          Issue Type: Bug
            Reporter: Ewen Cheslack-Postava
            Assignee: Ewen Cheslack-Postava


>From the mailing list, it can generate this exception:

2014-10-20 18:55:21.841 [kafka-producer-network-thread] ERROR
org.apache.kafka.clients.producer.internals.Sender - Uncaught error in
kafka producer I/O thread:
*java.lang.NullPointerException*
at
org.xerial.snappy.BufferRecycler.releaseInputBuffer(BufferRecycler.java:153)
at org.xerial.snappy.SnappyOutputStream.close(SnappyOutputStream.java:317)
at java.io.FilterOutputStream.close(FilterOutputStream.java:160)
at org.apache.kafka.common.record.Compressor.close(Compressor.java:94)
at
org.apache.kafka.common.record.MemoryRecords.close(MemoryRecords.java:119)
at
org.apache.kafka.clients.producer.internals.RecordAccumulator.drain(RecordAccumulator.java:285)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
at java.lang.Thread.run(Thread.java:744)

This appears to be an issue with the snappy-java library using ThreadLocal for 
an internal buffer recycling object which results in that object being shared 
unsafely across threads if one thread sends to multiple producers:

{quote}
I think the issue is that you're
using all your producers across a thread pool and the snappy library
uses ThreadLocal BufferRecyclers. When new Snappy streams are allocated,
they may be allocated from the same thread (e.g. one of your MyProducer
classes calls Producer.send() on multiple producers from the same
thread) and therefore use the same BufferRecycler. Eventually you hit
the code in the stacktrace, and if two producer send threads hit it
concurrently they improperly share the unsynchronized BufferRecycler.

This seems like a pain to fix -- it's really a deficiency of the snappy
library and as far as I can see there's no external control over
BufferRecycler in their API. One possibility is to record the thread ID
when we generate a new stream in Compressor and use that to synchronize
access to ensure no concurrent BufferRecycler access. That could be made
specific to snappy so it wouldn't impact other codecs. Not exactly
ideal, but it would work. Unfortunately I can't think of any way for you
to protect against this in your own code since the problem arises in the
producer send thread, which your code should never know about.

Another option would be to setup your producers differently to avoid the
possibility of unsynchronized access from multiple threads (i.e. don't
use the same thread pool approach), but whether you can do that will
depend on your use case.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to