Re: failed with LeaderNotAvailableError -

2015-12-17 Thread Ben Davison
Hi David,

Are you running in docker? Are you trying to connect from to a remote box?
We found we could connect locally but couldn't connect from another remote
host.

(I've just started using kafka also)

We had the same issue and found out: host.name=<%=@ipaddress%> needed to be
the FQDN of the box.

Thanks,

Ben

On Thu, Dec 17, 2015 at 5:40 AM, David Montgomery  wrote:

> Hi,
>
> I am very concerned about using kafka in production given the below
> errors:
>
> Now issues with myt zookeeper.  Other services use ZK.  Only kafka fails.
> I have 2 kafka servers using 8.x.  How do I resolve?  I tried restarting
> services for kafka.  Below is my kafka server.properties file
>
> 'Traceback (most recent call last):
>   File
>
> "/usr/local/lib/python2.7/dist-packages/gevent-1.1b6-py2.7-linux-x86_64.egg/gevent/greenlet.py",
> line 523, in run
> result = self._run(*self.args, **self.kwargs)
>   File "/var/feed-server/ad-server/pixel-server.py", line 145, in
> send_kafka_message
> res = producer.send_messages(topic, message)
>   File "build/bdist.linux-x86_64/egg/kafka/producer/simple.py", line 52, in
> send_messages
> partition = self._next_partition(topic)
>   File "build/bdist.linux-x86_64/egg/kafka/producer/simple.py", line 36, in
> _next_partition
> self.client.load_metadata_for_topics(topic)
>   File "build/bdist.linux-x86_64/egg/kafka/client.py", line 383, in
> load_metadata_for_topics
> kafka.common.check_error(topic_metadata)
>   File "build/bdist.linux-x86_64/egg/kafka/common.py", line 233, in
> check_error
> raise error_class(response)
> LeaderNotAvailableError: TopicMetadata(topic='topic-test-production',
> error=5, partitions=[])
>  '{"adfadfadf)> failed with LeaderNotAvailableError
>
>
>
>
>
>
>
>
> # limitations under the License.
> # see kafka.server.KafkaConfig for additional details and defaults
>
> # Server Basics #
>
> # The id of the broker. This must be set to a unique integer for each
> broker.
> broker.id=<%=@broker_id%>
> advertised.host.name=<%=@ipaddress%>
> advertised.port=9092
> # Socket Server Settings
> #
>
> # The port the socket server listens on
> port=9092
>
> # Hostname the broker will bind to and advertise to producers and
> consumers.
> # If not set, the server will bind to all interfaces and advertise the
> value returned from
> # from java.net.InetAddress.getCanonicalHostName().
> host.name=<%=@ipaddress%>
>
> # The number of threads handling network requests
> num.network.threads=2
>
> # The number of threads doing disk I/O
> num.io.threads=2
>
> # The send buffer (SO_SNDBUF) used by the socket server
> socket.send.buffer.bytes=1048576
>
> # The receive buffer (SO_RCVBUF) used by the socket server
> socket.receive.buffer.bytes=1048576
>
> # The maximum size of a request that the socket server will accept
> (protection against OOM)
> socket.request.max.bytes=104857600
>
>
> # Log Basics #
>
> # A comma seperated list of directories under which to store log files
> log.dirs=/tmp/kafka-logs
>
> # The number of logical partitions per topic per server. More partitions
> allow greater parallelism
> # for consumption, but also mean more files.
> num.partitions=2
>
> # Log Flush Policy
> #
>
> # The following configurations control the flush of data to disk. This is
> among the most
> # important performance knob in kafka.
> # There are a few important trade-offs here:
> #1. Durability: Unflushed data may be lost if you are not using
> replication.
> #2. Latency: Very large flush intervals may lead to latency spikes when
> the flush does occur as there will be a lot of data to flush.
> #3. Throughput: The flush is generally the most expensive operation,
> and a small flush interval may lead to exceessive seeks.
> # The settings below allow one to configure the flush policy to flush data
> after a period of time or
> # every N messages (or both). This can be done globally and overridden on a
> per-topic basis.
>
> # The number of messages to accept before forcing a flush of data to disk
> log.flush.interval.messages=1
>
> # The maximum amount of time a message can sit in a log before we force a
> flush
> log.flush.interval.ms=1000
>
> # Per-topic overrides for log.flush.interval.ms
> #log.flush.intervals.ms.per.topic=topic1:1000, topic2:3000
>
> # Log Retention Policy
> #
>
> # The following configurations control the disposal of log segments. The
> policy can
> # be set to delete segments after a period of time, or after a given size
> has accumulated.
> # A segment will be deleted whenever *either* of these criteria are met.
> Deletion always happens
> # from the end of the log.
>
> # The minimum age of a log file to be eligible for 

Re: failed with LeaderNotAvailableError -

2015-12-17 Thread Ben Davison
I probably should of mentioned that this was using Amazon ECS.

On Thu, Dec 17, 2015 at 12:18 PM, Marko Bonaći 
wrote:

> It doesn't have to be FQDN.
>
> Here's how I run Kafka in a container:
> docker run --name st-kafka -p 2181:2181 -p 9092:9092 -e
> ADVERTISED_HOST=`docker-machine ip dev-st` -e ADVERTISED_PORT=9092 -d
> spotify/kafka
>
> And then you have access to Kafka on the docker host VM from any other
> machine.
> BTW I use Spotify's image since it contains both ZK and Kafka, but I think
> the latest version they built is 0.8.2.1, so you might have to build the
> new image yourself if you need 0.9, but that's trivial to do.
>
> Marko Bonaći
> Monitoring | Alerting | Anomaly Detection | Centralized Log Management
> Solr & Elasticsearch Support
> Sematext  | Contact
> 
>
> On Thu, Dec 17, 2015 at 11:33 AM, Ben Davison 
> wrote:
>
> > Hi David,
> >
> > Are you running in docker? Are you trying to connect from to a remote
> box?
> > We found we could connect locally but couldn't connect from another
> remote
> > host.
> >
> > (I've just started using kafka also)
> >
> > We had the same issue and found out: host.name=<%=@ipaddress%> needed to
> > be
> > the FQDN of the box.
> >
> > Thanks,
> >
> > Ben
> >
> > On Thu, Dec 17, 2015 at 5:40 AM, David Montgomery <
> > davidmontgom...@gmail.com
> > > wrote:
> >
> > > Hi,
> > >
> > > I am very concerned about using kafka in production given the below
> > > errors:
> > >
> > > Now issues with myt zookeeper.  Other services use ZK.  Only kafka
> fails.
> > > I have 2 kafka servers using 8.x.  How do I resolve?  I tried
> restarting
> > > services for kafka.  Below is my kafka server.properties file
> > >
> > > 'Traceback (most recent call last):
> > >   File
> > >
> > >
> >
> "/usr/local/lib/python2.7/dist-packages/gevent-1.1b6-py2.7-linux-x86_64.egg/gevent/greenlet.py",
> > > line 523, in run
> > > result = self._run(*self.args, **self.kwargs)
> > >   File "/var/feed-server/ad-server/pixel-server.py", line 145, in
> > > send_kafka_message
> > > res = producer.send_messages(topic, message)
> > >   File "build/bdist.linux-x86_64/egg/kafka/producer/simple.py", line
> 52,
> > in
> > > send_messages
> > > partition = self._next_partition(topic)
> > >   File "build/bdist.linux-x86_64/egg/kafka/producer/simple.py", line
> 36,
> > in
> > > _next_partition
> > > self.client.load_metadata_for_topics(topic)
> > >   File "build/bdist.linux-x86_64/egg/kafka/client.py", line 383, in
> > > load_metadata_for_topics
> > > kafka.common.check_error(topic_metadata)
> > >   File "build/bdist.linux-x86_64/egg/kafka/common.py", line 233, in
> > > check_error
> > > raise error_class(response)
> > > LeaderNotAvailableError: TopicMetadata(topic='topic-test-production',
> > > error=5, partitions=[])
> > >  send_kafka_message('topic-test-production',
> > > '{"adfadfadf)> failed with LeaderNotAvailableError
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > # limitations under the License.
> > > # see kafka.server.KafkaConfig for additional details and defaults
> > >
> > > # Server Basics
> #
> > >
> > > # The id of the broker. This must be set to a unique integer for each
> > > broker.
> > > broker.id=<%=@broker_id%>
> > > advertised.host.name=<%=@ipaddress%>
> > > advertised.port=9092
> > > # Socket Server Settings
> > > #
> > >
> > > # The port the socket server listens on
> > > port=9092
> > >
> > > # Hostname the broker will bind to and advertise to producers and
> > > consumers.
> > > # If not set, the server will bind to all interfaces and advertise the
> > > value returned from
> > > # from java.net.InetAddress.getCanonicalHostName().
> > > host.name=<%=@ipaddress%>
> > >
> > > # The number of threads handling network requests
> > > num.network.threads=2
> > >
> > > # The number of threads doing disk I/O
> > > num.io.threads=2
> > >
> > > # The send buffer (SO_SNDBUF) used by the socket server
> > > socket.send.buffer.bytes=1048576
> > >
> > > # The receive buffer (SO_RCVBUF) used by the socket server
> > > socket.receive.buffer.bytes=1048576
> > >
> > > # The maximum size of a request that the socket server will accept
> > > (protection against OOM)
> > > socket.request.max.bytes=104857600
> > >
> > >
> > > # Log Basics #
> > >
> > > # A comma seperated list of directories under which to store log files
> > > log.dirs=/tmp/kafka-logs
> > >
> > > # The number of logical partitions per topic per server. More
> partitions
> > > allow greater parallelism
> > > # for consumption, but also mean more files.
> > > num.partitions=2
> > >
> > > # Log Flush Policy
> > > #
> > >
> > > # The following configurations control the 

Kafka User Group meeting Link

2015-12-17 Thread prabhu v
Hi,

Can anyone provide me the link for the KAFKA USER Group meetings which
happened on Jun. 14, 2012 and June 3, 2014??

Link provided in the below wiki page is not a valid one..

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations




-- 
Regards,

Prabhu.V

-- 
Regards,

Prabhu.V


Re: reassign __consumer_offsets partitions

2015-12-17 Thread Ben Stopford
Hi Damian

The reassignment should treat the offsets topic as any other topic. I did a 
quick test and it seemed to work for me. Do you see anything suspicious in the 
controller log?

B
> On 16 Dec 2015, at 14:51, Damian Guy  wrote:
> 
> Hi,
> 
> 
> We have had some temporary nodes in our kafka cluster and i now need to
> move assigned partitions off of those nodes onto the permanent members. I'm
> familiar with the kafka-reassign-partitions script, but ... How do i get it
> to work with the __consumer_offsets partition? It currently seems to ignore
> it.
> 
> Thanks,
> Damian



Re: Low-latency, high message size variance

2015-12-17 Thread Jason Gustafson
Hi Jens,

Gut feeling is that it's not a trivial patch, but you're more than welcome
to take a shot. We should probably do a KIP also since it's a change to a
public API (even if we only just released it). That's also a good way to
get feedback and make sure we're not missing a better approach. Here's a
link to the KIP guide:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals.
Let me know if I can help out.

-Jason

On Wed, Dec 16, 2015 at 2:07 PM, Jens Rantil  wrote:

> Hi again Jason,
>
> Once again thanks for your answer.
>
> Yes, the more I think/read about this it sounds like the "max records"
> approach is more viable. Without knowing the code, I guess it would make
> more sense to create a "max.partition.fetch.messages" property. That way a
> consumer could optimize for quick fetch on startup instead of per-poll()
> call. And if a consumer really would like to change the number of messages
> realtime, they could simply close the consumer and restart it.
>
> I spent 45 minutes trying to set up a development environment to have a
> look at the Kafka code and maybe submit a pull request for this. Do you
> think this would be hard to implement? Would introducing this need a larger
> consensus/discussion in KAFKA-2986?
>
> Last, but not least, I'm happy to hear that this case is something that
> Kafka should handle. I've reviewed many queueing solutions really seem like
> the absolute best solution to our problem as long we can overcome this
> issue.
>
> Thanks,
> Jens
>
> On Tuesday, December 15, 2015, Jason Gustafson  > wrote:
>
> > I was talking with Jay this afternoon about this use case. The tricky
> thing
> > about adding a ping() or heartbeat() API is that you have to deal with
> the
> > potential for rebalancing. This means either allowing it to block while a
> > rebalance completes or having it raise an exception indicating that a
> > rebalance is needed. In code, the latter might look like this:
> >
> > while (running) {
> >   ConsumerRecords records = consumer.poll(1000);
> >   try {
> > for (ConsumerRecord record : records) {
> >   process(record);
> >   consumer.heartbeat();
> > }
> >   } catch (RebalanceException e){
> > continue;
> >   }
> > }
> >
> > Unfortunately, this wouldn't work with auto-commit since it would tend to
> > break message processing early which would let the committed position get
> > ahead of the last offset processed. The alternative blocking approach
> > wouldn't be any better in this regard. Overall, it seems like this might
> > introduce a bigger problem than it solves.
> >
> > Perhaps the simpler solution is to provide a way to set the maximum
> number
> > of messages returned. This could either be a new configuration option or
> a
> > second argument in poll, but it would let you handle messages one-by-one
> if
> > you needed to. You'd then be able to set the session timeout according to
> > the expected time to handle a single message. It'd be a bit more work to
> > implement this, but if the use case is common enough, it might be
> > worthwhile.
> >
> > -Jason
> >
> > On Tue, Dec 15, 2015 at 10:31 AM, Jason Gustafson 
> > wrote:
> >
> > > Hey Jens,
> > >
> > > The purpose of pause() is to stop fetches for a set of partitions. This
> > > lets you continue calling poll() to send heartbeats. Also note that
> > poll()
> > > generally only blocks for rebalances. In code, something like this is
> > what
> > > I was thinking:
> > >
> > > while (running) {
> > >   ConsumerRecords records = consumer.poll(1000);
> > >   if (queue.offer(records))
> > > continue;
> > >
> > >   TopicPartition[] assignment = toArray(consumer.assignment());
> > >   consumer.pause(assignment);
> > >   while (!queue.offer(records, heartbeatIntervalMs,
> > TimeUnit.MILLISECONDS))
> > > consumer.poll(0);
> > >   consumer.resume(assignment);
> > > }
> > >
> > > The tricky thing is handling rebalances since they might occur in
> either
> > > call to poll(). In a rebalance, you have to 1) drain the queue, 2)
> commit
> > > current offsets, and 3) maybe break from the inner poll loop. If the
> > > processing thread is busy when the rebalance is triggered, then you may
> > > have to discard the results when it's finished. It's also a little
> > > difficult communicating completion to the poll loop, which is where the
> > > offset commit needs to take place. I suppose another queue would work,
> > > sigh.
> > >
> > > Well, I think you can make that work, but I tend to agree that it's
> > pretty
> > > complicated. Perhaps instead of a queue, you should just submit the
> > > processor to an executor service for each record set returned and await
> > its
> > > completion directly. For example:
> > >
> > > while (running) {
> > >   ConsumerRecords records = consumer.poll(1000);
> > >   Future future = executor.submit(new 

Re: Fallout from upgrading to kafka 0.9 from 0.8.2.3

2015-12-17 Thread Rajiv Kurian
Hi Jun,
Answers inline:

On Thu, Dec 17, 2015 at 9:41 AM, Jun Rao  wrote:

> Rajiv,
>
> Thanks for reporting this.
>
> 1. How did you verify that 3 of the topics are corrupted? Did you use
> DumpLogSegments tool? Also, is there a simple way to reproduce the
> corruption?
>
No I did not. The only reason I had to believe that was no writers could
write to the topic. I have actually no idea what the problem was. I saw
very frequent (much more than usual) messages of the form:
INFO  [kafka-request-handler-2] [kafka.server.KafkaApis
  ]: [KafkaApi-6] Close connection due to error handling produce
request with correlation id 294218 from client id  with ack=0
and also message of the form:
INFO  [kafka-network-thread-9092-0] [kafka.network.Processor
  ]: Closing socket connection to /some ip
The cluster was actually a critical one so I had no recourse but to revert
the change (which like noted didn't fix things). I didn't have enough time
to debug further. The only way I could fix it with my limited Kafka
knowledge was (after reverting) deleting the topic and recreating it.
I had updated a low priority cluster before that worked just fine. That
gave me the confidence to upgrade this higher priority cluster which did
NOT work out. So the only way for me to try to reproduce it is to try this
on our larger clusters again. But it is critical that we don't mess up this
high priority cluster so I am afraid to try again.

> 2. As Lance mentioned, if you are using snappy, make sure that you include
> the right snappy jar (1.1.1.7).
>
Wonder why I don't see Lance's email in this thread. Either way we are not
using compression of any kind on this topic.

> 3. For the CPU issue, could you do a bit profiling to see which thread is
> busy and where it's spending time?
>
Since I had to revert I didn't have the time to profile. Intuitively it
would seem like the high number of client disconnects/errors and the
increased network usage probably has something to do with the high CPU
(total guess). Again our other (lower traffic) cluster that was upgraded
was totally fine so it doesn't seem like it happens all the time.

>
> Jun
>
>
> On Tue, Dec 15, 2015 at 12:52 PM, Rajiv Kurian  wrote:
>
> > We had to revert to 0.8.3 because three of our topics seem to have gotten
> > corrupted during the upgrade. As soon as we did the upgrade producers to
> > the three topics I mentioned stopped being able to do writes. The clients
> > complained (occasionally) about leader not found exceptions. We restarted
> > our clients and brokers but that didn't seem to help. Actually even after
> > reverting to 0.8.3 these three topics were broken. To fix it we had to
> stop
> > all clients, delete the topics, create them again and then restart the
> > clients.
> >
> > I realize this is not a lot of info. I couldn't wait to get more debug
> info
> > because the cluster was actually being used. Has any one run into
> something
> > like this? Are there any known issues with old consumers/producers. The
> > topics that got busted had clients writing to them using the old Java
> > wrapper over the Scala producer.
> >
> > Here are the steps I took to upgrade.
> >
> > For each broker:
> >
> > 1. Stop the broker.
> > 2. Restart with the 0.9 broker running with
> > inter.broker.protocol.version=0.8.2.X
> > 3. Wait for under replicated partitions to go down to 0.
> > 4. Go to step 1.
> > Once all the brokers were running the 0.9 code with
> > inter.broker.protocol.version=0.8.2.X we restarted them one by one with
> > inter.broker.protocol.version=0.9.0.0
> >
> > When reverting I did the following.
> >
> > For each broker.
> >
> > 1. Stop the broker.
> > 2. Restart with the 0.9 broker running with
> > inter.broker.protocol.version=0.8.2.X
> > 3. Wait for under replicated partitions to go down to 0.
> > 4. Go to step 1.
> >
> > Once all the brokers were running 0.9 code with
> > inter.broker.protocol.version=0.8.2.X  I restarted them one by one with
> the
> > 0.8.2.3 broker code. This however like I mentioned did not fix the three
> > broken topics.
> >
> >
> > On Mon, Dec 14, 2015 at 3:13 PM, Rajiv Kurian 
> wrote:
> >
> > > Now that it has been a bit longer, the spikes I was seeing are gone but
> > > the CPU and network in/out on the three brokers that were showing the
> > > spikes are still much higher than before the upgrade. Their CPUs have
> > > increased from around 1-2% to 12-20%. The network in on the same
> brokers
> > > has gone up from under 2 Mb/sec to 19-33 Mb/sec. The network out has
> gone
> > > up from under 2 Mb/sec to 29-42 Mb/sec. I don't see a corresponding
> > > increase in kafka messages in per second or kafka bytes in per second
> JMX
> > > metrics.
> > >
> > > Thanks,
> > > Rajiv
> > >
> >
>


Re: Fallout from upgrading to kafka 0.9 from 0.8.2.3

2015-12-17 Thread Jun Rao
Rajiv,

Thanks for reporting this.

1. How did you verify that 3 of the topics are corrupted? Did you use
DumpLogSegments tool? Also, is there a simple way to reproduce the
corruption?
2. As Lance mentioned, if you are using snappy, make sure that you include
the right snappy jar (1.1.1.7).
3. For the CPU issue, could you do a bit profiling to see which thread is
busy and where it's spending time?

Jun


On Tue, Dec 15, 2015 at 12:52 PM, Rajiv Kurian  wrote:

> We had to revert to 0.8.3 because three of our topics seem to have gotten
> corrupted during the upgrade. As soon as we did the upgrade producers to
> the three topics I mentioned stopped being able to do writes. The clients
> complained (occasionally) about leader not found exceptions. We restarted
> our clients and brokers but that didn't seem to help. Actually even after
> reverting to 0.8.3 these three topics were broken. To fix it we had to stop
> all clients, delete the topics, create them again and then restart the
> clients.
>
> I realize this is not a lot of info. I couldn't wait to get more debug info
> because the cluster was actually being used. Has any one run into something
> like this? Are there any known issues with old consumers/producers. The
> topics that got busted had clients writing to them using the old Java
> wrapper over the Scala producer.
>
> Here are the steps I took to upgrade.
>
> For each broker:
>
> 1. Stop the broker.
> 2. Restart with the 0.9 broker running with
> inter.broker.protocol.version=0.8.2.X
> 3. Wait for under replicated partitions to go down to 0.
> 4. Go to step 1.
> Once all the brokers were running the 0.9 code with
> inter.broker.protocol.version=0.8.2.X we restarted them one by one with
> inter.broker.protocol.version=0.9.0.0
>
> When reverting I did the following.
>
> For each broker.
>
> 1. Stop the broker.
> 2. Restart with the 0.9 broker running with
> inter.broker.protocol.version=0.8.2.X
> 3. Wait for under replicated partitions to go down to 0.
> 4. Go to step 1.
>
> Once all the brokers were running 0.9 code with
> inter.broker.protocol.version=0.8.2.X  I restarted them one by one with the
> 0.8.2.3 broker code. This however like I mentioned did not fix the three
> broken topics.
>
>
> On Mon, Dec 14, 2015 at 3:13 PM, Rajiv Kurian  wrote:
>
> > Now that it has been a bit longer, the spikes I was seeing are gone but
> > the CPU and network in/out on the three brokers that were showing the
> > spikes are still much higher than before the upgrade. Their CPUs have
> > increased from around 1-2% to 12-20%. The network in on the same brokers
> > has gone up from under 2 Mb/sec to 19-33 Mb/sec. The network out has gone
> > up from under 2 Mb/sec to 29-42 Mb/sec. I don't see a corresponding
> > increase in kafka messages in per second or kafka bytes in per second JMX
> > metrics.
> >
> > Thanks,
> > Rajiv
> >
>


Re: failed with LeaderNotAvailableError -

2015-12-17 Thread Dana Powers
Hi Ben and Marko -- great suggestions re: connection failures and docker.

The specific error here is: LeaderNotAvailableError:
TopicMetadata(topic='topic-test-production', error=5, partitions=[])

That is an error code (5) returned from a MetadataRequest. In this context
it means that the topic did not exist and so the request triggered an
auto-create initialization (i.e., the connection was fine). Topic
initialization tends to take a few seconds to complete, but only needs to
happen once per topic. A retry here is generally fine. This retry should
probably be handled under the covers by the client code. So in this case I
would treat it as a simple kafka-python issue (#488).

-Dana

On Thu, Dec 17, 2015 at 4:58 AM, Ben Davison 
wrote:

> I probably should of mentioned that this was using Amazon ECS.
>
> On Thu, Dec 17, 2015 at 12:18 PM, Marko Bonaći 
> wrote:
>
> > It doesn't have to be FQDN.
> >
> > Here's how I run Kafka in a container:
> > docker run --name st-kafka -p 2181:2181 -p 9092:9092 -e
> > ADVERTISED_HOST=`docker-machine ip dev-st` -e ADVERTISED_PORT=9092 -d
> > spotify/kafka
> >
> > And then you have access to Kafka on the docker host VM from any other
> > machine.
> > BTW I use Spotify's image since it contains both ZK and Kafka, but I
> think
> > the latest version they built is 0.8.2.1, so you might have to build the
> > new image yourself if you need 0.9, but that's trivial to do.
> >
> > Marko Bonaći
> > Monitoring | Alerting | Anomaly Detection | Centralized Log Management
> > Solr & Elasticsearch Support
> > Sematext  | Contact
> > 
> >
> > On Thu, Dec 17, 2015 at 11:33 AM, Ben Davison 
> > wrote:
> >
> > > Hi David,
> > >
> > > Are you running in docker? Are you trying to connect from to a remote
> > box?
> > > We found we could connect locally but couldn't connect from another
> > remote
> > > host.
> > >
> > > (I've just started using kafka also)
> > >
> > > We had the same issue and found out: host.name=<%=@ipaddress%> needed
> to
> > > be
> > > the FQDN of the box.
> > >
> > > Thanks,
> > >
> > > Ben
> > >
> > > On Thu, Dec 17, 2015 at 5:40 AM, David Montgomery <
> > > davidmontgom...@gmail.com
> > > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am very concerned about using kafka in production given the below
> > > > errors:
> > > >
> > > > Now issues with myt zookeeper.  Other services use ZK.  Only kafka
> > fails.
> > > > I have 2 kafka servers using 8.x.  How do I resolve?  I tried
> > restarting
> > > > services for kafka.  Below is my kafka server.properties file
> > > >
> > > > 'Traceback (most recent call last):
> > > >   File
> > > >
> > > >
> > >
> >
> "/usr/local/lib/python2.7/dist-packages/gevent-1.1b6-py2.7-linux-x86_64.egg/gevent/greenlet.py",
> > > > line 523, in run
> > > > result = self._run(*self.args, **self.kwargs)
> > > >   File "/var/feed-server/ad-server/pixel-server.py", line 145, in
> > > > send_kafka_message
> > > > res = producer.send_messages(topic, message)
> > > >   File "build/bdist.linux-x86_64/egg/kafka/producer/simple.py", line
> > 52,
> > > in
> > > > send_messages
> > > > partition = self._next_partition(topic)
> > > >   File "build/bdist.linux-x86_64/egg/kafka/producer/simple.py", line
> > 36,
> > > in
> > > > _next_partition
> > > > self.client.load_metadata_for_topics(topic)
> > > >   File "build/bdist.linux-x86_64/egg/kafka/client.py", line 383, in
> > > > load_metadata_for_topics
> > > > kafka.common.check_error(topic_metadata)
> > > >   File "build/bdist.linux-x86_64/egg/kafka/common.py", line 233, in
> > > > check_error
> > > > raise error_class(response)
> > > > LeaderNotAvailableError: TopicMetadata(topic='topic-test-production',
> > > > error=5, partitions=[])
> > > >  > send_kafka_message('topic-test-production',
> > > > '{"adfadfadf)> failed with LeaderNotAvailableError
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > # limitations under the License.
> > > > # see kafka.server.KafkaConfig for additional details and defaults
> > > >
> > > > # Server Basics
> > #
> > > >
> > > > # The id of the broker. This must be set to a unique integer for each
> > > > broker.
> > > > broker.id=<%=@broker_id%>
> > > > advertised.host.name=<%=@ipaddress%>
> > > > advertised.port=9092
> > > > # Socket Server Settings
> > > > #
> > > >
> > > > # The port the socket server listens on
> > > > port=9092
> > > >
> > > > # Hostname the broker will bind to and advertise to producers and
> > > > consumers.
> > > > # If not set, the server will bind to all interfaces and advertise
> the
> > > > value returned from
> > > > # from java.net.InetAddress.getCanonicalHostName().
> > > > host.name=<%=@ipaddress%>
> > > >
> > > > # The number of threads handling network 

Re: Kafka User Group meeting Link

2015-12-17 Thread Jens Rantil
Hi,


In which part of the world?




Cheers,

Jens





–
Skickat från Mailbox

On Thu, Dec 17, 2015 at 8:23 AM, prabhu v 
wrote:

> Hi,
> Can anyone provide me the link for the KAFKA USER Group meetings which
> happened on Jun. 14, 2012 and June 3, 2014??
> Link provided in the below wiki page is not a valid one..
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations
> -- 
> Regards,
> Prabhu.V
> -- 
> Regards,
> Prabhu.V

Re: Fallout from upgrading to kafka 0.9 from 0.8.2.3

2015-12-17 Thread Rajiv Kurian
The topic which stopped working had clients that were only using the old
Java producer that is a wrapper over the Scala producer. Again it seemed to
work perfectly in another of our realms where we have the same topics, same
producers/consumers etc but with less traffic.

On Thu, Dec 17, 2015 at 12:23 PM, Jun Rao  wrote:

> Are you using the new java producer?
>
> Thanks,
>
> Jun
>
> On Thu, Dec 17, 2015 at 9:58 AM, Rajiv Kurian  wrote:
>
> > Hi Jun,
> > Answers inline:
> >
> > On Thu, Dec 17, 2015 at 9:41 AM, Jun Rao  wrote:
> >
> > > Rajiv,
> > >
> > > Thanks for reporting this.
> > >
> > > 1. How did you verify that 3 of the topics are corrupted? Did you use
> > > DumpLogSegments tool? Also, is there a simple way to reproduce the
> > > corruption?
> > >
> > No I did not. The only reason I had to believe that was no writers could
> > write to the topic. I have actually no idea what the problem was. I saw
> > very frequent (much more than usual) messages of the form:
> > INFO  [kafka-request-handler-2] [kafka.server.KafkaApis
> >   ]: [KafkaApi-6] Close connection due to error handling produce
> > request with correlation id 294218 from client id  with ack=0
> > and also message of the form:
> > INFO  [kafka-network-thread-9092-0] [kafka.network.Processor
> >   ]: Closing socket connection to /some ip
> > The cluster was actually a critical one so I had no recourse but to
> revert
> > the change (which like noted didn't fix things). I didn't have enough
> time
> > to debug further. The only way I could fix it with my limited Kafka
> > knowledge was (after reverting) deleting the topic and recreating it.
> > I had updated a low priority cluster before that worked just fine. That
> > gave me the confidence to upgrade this higher priority cluster which did
> > NOT work out. So the only way for me to try to reproduce it is to try
> this
> > on our larger clusters again. But it is critical that we don't mess up
> this
> > high priority cluster so I am afraid to try again.
> >
> > > 2. As Lance mentioned, if you are using snappy, make sure that you
> > include
> > > the right snappy jar (1.1.1.7).
> > >
> > Wonder why I don't see Lance's email in this thread. Either way we are
> not
> > using compression of any kind on this topic.
> >
> > > 3. For the CPU issue, could you do a bit profiling to see which thread
> is
> > > busy and where it's spending time?
> > >
> > Since I had to revert I didn't have the time to profile. Intuitively it
> > would seem like the high number of client disconnects/errors and the
> > increased network usage probably has something to do with the high CPU
> > (total guess). Again our other (lower traffic) cluster that was upgraded
> > was totally fine so it doesn't seem like it happens all the time.
> >
> > >
> > > Jun
> > >
> > >
> > > On Tue, Dec 15, 2015 at 12:52 PM, Rajiv Kurian 
> > wrote:
> > >
> > > > We had to revert to 0.8.3 because three of our topics seem to have
> > gotten
> > > > corrupted during the upgrade. As soon as we did the upgrade producers
> > to
> > > > the three topics I mentioned stopped being able to do writes. The
> > clients
> > > > complained (occasionally) about leader not found exceptions. We
> > restarted
> > > > our clients and brokers but that didn't seem to help. Actually even
> > after
> > > > reverting to 0.8.3 these three topics were broken. To fix it we had
> to
> > > stop
> > > > all clients, delete the topics, create them again and then restart
> the
> > > > clients.
> > > >
> > > > I realize this is not a lot of info. I couldn't wait to get more
> debug
> > > info
> > > > because the cluster was actually being used. Has any one run into
> > > something
> > > > like this? Are there any known issues with old consumers/producers.
> The
> > > > topics that got busted had clients writing to them using the old Java
> > > > wrapper over the Scala producer.
> > > >
> > > > Here are the steps I took to upgrade.
> > > >
> > > > For each broker:
> > > >
> > > > 1. Stop the broker.
> > > > 2. Restart with the 0.9 broker running with
> > > > inter.broker.protocol.version=0.8.2.X
> > > > 3. Wait for under replicated partitions to go down to 0.
> > > > 4. Go to step 1.
> > > > Once all the brokers were running the 0.9 code with
> > > > inter.broker.protocol.version=0.8.2.X we restarted them one by one
> with
> > > > inter.broker.protocol.version=0.9.0.0
> > > >
> > > > When reverting I did the following.
> > > >
> > > > For each broker.
> > > >
> > > > 1. Stop the broker.
> > > > 2. Restart with the 0.9 broker running with
> > > > inter.broker.protocol.version=0.8.2.X
> > > > 3. Wait for under replicated partitions to go down to 0.
> > > > 4. Go to step 1.
> > > >
> > > > Once all the brokers were running 0.9 code with
> > > > inter.broker.protocol.version=0.8.2.X  I restarted them one by one
> with
> > > the
> > > > 0.8.2.3 broker code. This however 

Re: how to programatically monitor Kafka availability

2015-12-17 Thread hsy...@gmail.com
Hey Hohl,

I use *partitionsFor
*
method to monitor the partition info for particular topics



On Tue, Dec 15, 2015 at 11:27 AM, Hohl, Ken  wrote:

> We want to be able to monitor the ability to send messages to Kafka
> topics.  We want to be aware of the inability to do so before the time we
> attempt to send a message.  What we're looking for is something like a
> heartbeat.  The reason we need this is that in our deployment environment,
> Kafka and its clients will not be co-located.  As such, network issues
> could cause Kafka to not be available to its client.
>
> We've considered using Zookeeper that's already managing the Kafka cluster
> but have not been able to determine exactly how we would use it.
>
> We've also considered requesting a JMX MBean periodically and concluding
> the cluster is not accessible if we can't get the MBean from at least 1
> broker.
>
> What is the recommended way of accomplishing what we're trying to do?
>
> Thanks.
>
> Ken Hohl
> Cars.com
>
>


Where can I find the document for consumer metrics

2015-12-17 Thread hsy...@gmail.com
I can find some broker/producer metrics here
http://kafka.apache.org/documentation.html#monitoring

but where can I find consumer metrics docs

Everytime I have to log this to find out what metrics I want

MetricName [name=join-rate, group=consumer-coordinator-metrics,
description=The number of group joins per second,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@ebea716;MetricName
[name=fetch-size-avg, group=consumer-fetch-manager-metrics, description=The
average number of bytes fetched per request,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@6cb9cea;MetricName
[name=commit-latency-avg, group=consumer-coordinator-metrics,
description=The average time taken for a commit request,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@21aaca22;MetricName
[name=join-time-avg, group=consumer-coordinator-metrics, description=The
average time taken for a group rejoin,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@53bc8f72;MetricName
[name=incoming-byte-rate, group=consumer-metrics, description=Bytes/second
read off all sockets,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@34f9c3e0;MetricName
[name=bytes-consumed-rate, group=consumer-fetch-manager-metrics,
description=The average number of bytes consumed per second,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@36c7401a;MetricName
[name=response-rate, group=consumer-metrics, description=Responses received
sent per second.,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@5341870e;MetricName
[name=connection-creation-rate, group=consumer-metrics, description=New
connections established per second in the window.,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@9f0d8f4;MetricName
[name=fetch-rate, group=consumer-fetch-manager-metrics, description=The
number of fetch requests per second.,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@23338045;MetricName
[name=join-time-max, group=consumer-coordinator-metrics, description=The
max time taken for a group rejoin,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@5cdabd4d;MetricName
[name=io-wait-ratio, group=consumer-metrics, description=The fraction of
time the I/O thread spent waiting.,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@659b7186;MetricName
[name=fetch-size-max, group=consumer-fetch-manager-metrics, description=The
maximum number of bytes fetched per request,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@403a4887;MetricName
[name=assigned-partitions, group=consumer-coordinator-metrics,
description=The number of partitions currently assigned to this consumer,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@326fb802;MetricName
[name=io-time-ns-avg, group=consumer-metrics, description=The average
length of time for I/O per select call in nanoseconds.,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@432b0ee3;MetricName
[name=records-consumed-rate, group=consumer-fetch-manager-metrics,
description=The average number of records consumed per second,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@3fde7b88;MetricName
[name=io-wait-time-ns-avg, group=consumer-metrics, description=The average
length of time the I/O thread spent waiting for a socket ready for reads or
writes in nanoseconds.,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@5485cfd8;MetricName
[name=select-rate, group=consumer-metrics, description=Number of times the
I/O layer checked for new I/O to perform per second,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@2cbdcaf6;MetricName
[name=fetch-throttle-time-max, group=consumer-fetch-manager-metrics,
description=The maximum throttle time in ms,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@6057f36c;MetricName
[name=heartbeat-response-time-max, group=consumer-coordinator-metrics,
description=The max time taken to receive a response to a heartbeat
request,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@2e2e68de;MetricName
[name=network-io-rate, group=consumer-metrics, description=The average
number of network operations (reads or writes) on all connections per
second.,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@68e6de81;MetricName
[name=fetch-latency-max, group=consumer-fetch-manager-metrics,
description=The max time taken for any fetch request.,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@d1a1cf5;MetricName
[name=request-size-avg, group=consumer-metrics, description=The average
size of all requests in the window..,
tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@2d631f8b;MetricName
[name=commit-rate, group=consumer-coordinator-metrics, 

Re: Fallout from upgrading to kafka 0.9 from 0.8.2.3

2015-12-17 Thread Jun Rao
Are you using the new java producer?

Thanks,

Jun

On Thu, Dec 17, 2015 at 9:58 AM, Rajiv Kurian  wrote:

> Hi Jun,
> Answers inline:
>
> On Thu, Dec 17, 2015 at 9:41 AM, Jun Rao  wrote:
>
> > Rajiv,
> >
> > Thanks for reporting this.
> >
> > 1. How did you verify that 3 of the topics are corrupted? Did you use
> > DumpLogSegments tool? Also, is there a simple way to reproduce the
> > corruption?
> >
> No I did not. The only reason I had to believe that was no writers could
> write to the topic. I have actually no idea what the problem was. I saw
> very frequent (much more than usual) messages of the form:
> INFO  [kafka-request-handler-2] [kafka.server.KafkaApis
>   ]: [KafkaApi-6] Close connection due to error handling produce
> request with correlation id 294218 from client id  with ack=0
> and also message of the form:
> INFO  [kafka-network-thread-9092-0] [kafka.network.Processor
>   ]: Closing socket connection to /some ip
> The cluster was actually a critical one so I had no recourse but to revert
> the change (which like noted didn't fix things). I didn't have enough time
> to debug further. The only way I could fix it with my limited Kafka
> knowledge was (after reverting) deleting the topic and recreating it.
> I had updated a low priority cluster before that worked just fine. That
> gave me the confidence to upgrade this higher priority cluster which did
> NOT work out. So the only way for me to try to reproduce it is to try this
> on our larger clusters again. But it is critical that we don't mess up this
> high priority cluster so I am afraid to try again.
>
> > 2. As Lance mentioned, if you are using snappy, make sure that you
> include
> > the right snappy jar (1.1.1.7).
> >
> Wonder why I don't see Lance's email in this thread. Either way we are not
> using compression of any kind on this topic.
>
> > 3. For the CPU issue, could you do a bit profiling to see which thread is
> > busy and where it's spending time?
> >
> Since I had to revert I didn't have the time to profile. Intuitively it
> would seem like the high number of client disconnects/errors and the
> increased network usage probably has something to do with the high CPU
> (total guess). Again our other (lower traffic) cluster that was upgraded
> was totally fine so it doesn't seem like it happens all the time.
>
> >
> > Jun
> >
> >
> > On Tue, Dec 15, 2015 at 12:52 PM, Rajiv Kurian 
> wrote:
> >
> > > We had to revert to 0.8.3 because three of our topics seem to have
> gotten
> > > corrupted during the upgrade. As soon as we did the upgrade producers
> to
> > > the three topics I mentioned stopped being able to do writes. The
> clients
> > > complained (occasionally) about leader not found exceptions. We
> restarted
> > > our clients and brokers but that didn't seem to help. Actually even
> after
> > > reverting to 0.8.3 these three topics were broken. To fix it we had to
> > stop
> > > all clients, delete the topics, create them again and then restart the
> > > clients.
> > >
> > > I realize this is not a lot of info. I couldn't wait to get more debug
> > info
> > > because the cluster was actually being used. Has any one run into
> > something
> > > like this? Are there any known issues with old consumers/producers. The
> > > topics that got busted had clients writing to them using the old Java
> > > wrapper over the Scala producer.
> > >
> > > Here are the steps I took to upgrade.
> > >
> > > For each broker:
> > >
> > > 1. Stop the broker.
> > > 2. Restart with the 0.9 broker running with
> > > inter.broker.protocol.version=0.8.2.X
> > > 3. Wait for under replicated partitions to go down to 0.
> > > 4. Go to step 1.
> > > Once all the brokers were running the 0.9 code with
> > > inter.broker.protocol.version=0.8.2.X we restarted them one by one with
> > > inter.broker.protocol.version=0.9.0.0
> > >
> > > When reverting I did the following.
> > >
> > > For each broker.
> > >
> > > 1. Stop the broker.
> > > 2. Restart with the 0.9 broker running with
> > > inter.broker.protocol.version=0.8.2.X
> > > 3. Wait for under replicated partitions to go down to 0.
> > > 4. Go to step 1.
> > >
> > > Once all the brokers were running 0.9 code with
> > > inter.broker.protocol.version=0.8.2.X  I restarted them one by one with
> > the
> > > 0.8.2.3 broker code. This however like I mentioned did not fix the
> three
> > > broken topics.
> > >
> > >
> > > On Mon, Dec 14, 2015 at 3:13 PM, Rajiv Kurian 
> > wrote:
> > >
> > > > Now that it has been a bit longer, the spikes I was seeing are gone
> but
> > > > the CPU and network in/out on the three brokers that were showing the
> > > > spikes are still much higher than before the upgrade. Their CPUs have
> > > > increased from around 1-2% to 12-20%. The network in on the same
> > brokers
> > > > has gone up from under 2 Mb/sec to 19-33 Mb/sec. The network out has
> > gone
> > > > up from 

Re: Kafka 0.9 consumer API question

2015-12-17 Thread hsy...@gmail.com
Hi Rajiv,

I think it makes sense to return a read-only assignments. What we can
improve here is we can have addPartition method for
consumer.
Then we don't have to do any operations on the assignments returned by
assignment method

BTW,
I think you can implement PartitionAssignor interface to solve your use
case.
I couldn't find the javadoc for that interface but here is method you can
use

/**
 * Perform the group assignment given the member subscriptions and
current cluster metadata.
 * @param metadata Current topic/broker metadata known by consumer
 * @param subscriptions Subscriptions from all members provided through
{@link #subscription(Set)}
 * @return A map from the members to their respective assignment. This
should have one entry
 * for all members who in the input subscription map.
 */
Map assign(Cluster metadata, Map subscriptions);

The subscription map has each consumer's member id as key. It can be used
as a reference to the consumer and you can adjust the assignments there.




On Tue, Dec 15, 2015 at 2:53 PM, Rajiv Kurian  wrote:

> Hi Jason,
>
> The copying is not a problem in terms of performance. It's just annoying to
> write the extra code. My point with the copy is that since the client is
> already making a copy when it returns the set to me, why would it matter if
> I modify the copy. Creating an unmodifiable set on top of a copy seems
> redundant. It would be easiest for us as users to do something like this:
>
> final Set partitions = consumer.assignment();  // This
> already returns a copy of the underlying assignment, thus ensuring that the
> internal data structures are protected.
> partitions.add(myNewTopicPartition);  // This is fine to modify since
> consumer.assignment() returns a copy.
> partitions.remove(topicPartitionToBeRemoved);
> consumer.assign(partitions);
>
> Instead we have to do something like this right now.
>
> final Set partitions = consumer.assignment();  // This
> returns a copy of the underlying assignment wrapped in an UnmodifiableSet
> which seems redundant.
> final Set yetAnotherCopy = new HashSet<>(partitions);  //
> We need this copy since consumer.assignment() is unmodifiable, even though
> it is a copy.
> yetAnotherCopy.add(myNewTopicPartition);
> yetAnotherCopy.remove(topicPartitionToBeRemoved);
> List wayTooManyCopies = new ArrayList<>(yetAnotherCopy);
> consumer.assign(wayTooManyCopies);
>
> Thanks,
> Rajiv
>
>
> On Tue, Dec 15, 2015 at 2:35 PM, Jason Gustafson 
> wrote:
>
> > Hey Rajiv,
> >
> > I agree the Set/List inconsistency is a little unfortunate (another
> > annoying one is pause() which uses a vararg). I think we should probably
> > add the following variants:
> >
> > assign(Collection)
> > subscribe(Collection)
> > pause(Collection)
> >
> > I can open a JIRA to fix this. As for returning the unmodifiable set, I
> can
> > see your point, but I think it's a little dangerous for user code to
> depend
> > on being able to modify a collection returned from the API. Making it
> > immutable reduces the coupling with user code and gives us more freedom
> in
> > the future (not that we have any intention of changing the set type, but
> we
> > could). I think the way I might try to implement your use case would be
> to
> > maintain the assignment set yourself. You can make changes to that set
> and
> > always pass it to assign(), which would avoid the need to use
> assignment().
> > Also, I probably wouldn't be overly concerned about the copying overhead
> > unless profiling shows that it is actually a problem. Are your partition
> > assignments generally very large?
> >
> > -Jason
> >
> >
> > On Tue, Dec 15, 2015 at 1:32 PM, Rajiv Kurian 
> wrote:
> >
> > > We are trying to use the Kafka 0.9 consumer API to poll specific
> > > partitions. We consume partitions based on our own logic instead of
> > > delegating that to Kafka. One of our use cases is handling a change in
> > the
> > > partitions that we consume. This means that sometimes we need to
> consume
> > > additional partitions and other times we need to stop consuming (not
> > pause
> > > but stop entirely) some of the partitions that we are currently
> polling.
> > >
> > > The semantics of the assign() call at
> > >
> > >
> >
> http://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> > > is that we need to provide the entire list of subscriptions. So when we
> > > want to add or remove partitions we call the assignment() method to get
> > the
> > > existing set of TopicPartitions being polled, and then modify this set
> > and
> > > pass it back to the assign() call. However it seems weird that the
> > assign()
> > > call takes a List whereas the assignment call returns
> a
> > > Set. Further the Set returned by the method is an
> > > unmodifiable set which means to change this set we need to create a 

Re: Fallout from upgrading to kafka 0.9 from 0.8.2.3

2015-12-17 Thread Jun Rao
Hmm, anything special with those 3 topics? Also, the broker log shows that
the producer uses ack=0, which means the producer shouldn't get errors like
leader not found. Could you clarify on the ack used by the producer?

Thanks,

Jun

On Thu, Dec 17, 2015 at 12:41 PM, Rajiv Kurian  wrote:

> The topic which stopped working had clients that were only using the old
> Java producer that is a wrapper over the Scala producer. Again it seemed to
> work perfectly in another of our realms where we have the same topics, same
> producers/consumers etc but with less traffic.
>
> On Thu, Dec 17, 2015 at 12:23 PM, Jun Rao  wrote:
>
> > Are you using the new java producer?
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Dec 17, 2015 at 9:58 AM, Rajiv Kurian 
> wrote:
> >
> > > Hi Jun,
> > > Answers inline:
> > >
> > > On Thu, Dec 17, 2015 at 9:41 AM, Jun Rao  wrote:
> > >
> > > > Rajiv,
> > > >
> > > > Thanks for reporting this.
> > > >
> > > > 1. How did you verify that 3 of the topics are corrupted? Did you use
> > > > DumpLogSegments tool? Also, is there a simple way to reproduce the
> > > > corruption?
> > > >
> > > No I did not. The only reason I had to believe that was no writers
> could
> > > write to the topic. I have actually no idea what the problem was. I saw
> > > very frequent (much more than usual) messages of the form:
> > > INFO  [kafka-request-handler-2] [kafka.server.KafkaApis
> > >   ]: [KafkaApi-6] Close connection due to error handling produce
> > > request with correlation id 294218 from client id  with ack=0
> > > and also message of the form:
> > > INFO  [kafka-network-thread-9092-0] [kafka.network.Processor
> > >   ]: Closing socket connection to /some ip
> > > The cluster was actually a critical one so I had no recourse but to
> > revert
> > > the change (which like noted didn't fix things). I didn't have enough
> > time
> > > to debug further. The only way I could fix it with my limited Kafka
> > > knowledge was (after reverting) deleting the topic and recreating it.
> > > I had updated a low priority cluster before that worked just fine. That
> > > gave me the confidence to upgrade this higher priority cluster which
> did
> > > NOT work out. So the only way for me to try to reproduce it is to try
> > this
> > > on our larger clusters again. But it is critical that we don't mess up
> > this
> > > high priority cluster so I am afraid to try again.
> > >
> > > > 2. As Lance mentioned, if you are using snappy, make sure that you
> > > include
> > > > the right snappy jar (1.1.1.7).
> > > >
> > > Wonder why I don't see Lance's email in this thread. Either way we are
> > not
> > > using compression of any kind on this topic.
> > >
> > > > 3. For the CPU issue, could you do a bit profiling to see which
> thread
> > is
> > > > busy and where it's spending time?
> > > >
> > > Since I had to revert I didn't have the time to profile. Intuitively it
> > > would seem like the high number of client disconnects/errors and the
> > > increased network usage probably has something to do with the high CPU
> > > (total guess). Again our other (lower traffic) cluster that was
> upgraded
> > > was totally fine so it doesn't seem like it happens all the time.
> > >
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Tue, Dec 15, 2015 at 12:52 PM, Rajiv Kurian 
> > > wrote:
> > > >
> > > > > We had to revert to 0.8.3 because three of our topics seem to have
> > > gotten
> > > > > corrupted during the upgrade. As soon as we did the upgrade
> producers
> > > to
> > > > > the three topics I mentioned stopped being able to do writes. The
> > > clients
> > > > > complained (occasionally) about leader not found exceptions. We
> > > restarted
> > > > > our clients and brokers but that didn't seem to help. Actually even
> > > after
> > > > > reverting to 0.8.3 these three topics were broken. To fix it we had
> > to
> > > > stop
> > > > > all clients, delete the topics, create them again and then restart
> > the
> > > > > clients.
> > > > >
> > > > > I realize this is not a lot of info. I couldn't wait to get more
> > debug
> > > > info
> > > > > because the cluster was actually being used. Has any one run into
> > > > something
> > > > > like this? Are there any known issues with old consumers/producers.
> > The
> > > > > topics that got busted had clients writing to them using the old
> Java
> > > > > wrapper over the Scala producer.
> > > > >
> > > > > Here are the steps I took to upgrade.
> > > > >
> > > > > For each broker:
> > > > >
> > > > > 1. Stop the broker.
> > > > > 2. Restart with the 0.9 broker running with
> > > > > inter.broker.protocol.version=0.8.2.X
> > > > > 3. Wait for under replicated partitions to go down to 0.
> > > > > 4. Go to step 1.
> > > > > Once all the brokers were running the 0.9 code with
> > > > > inter.broker.protocol.version=0.8.2.X we restarted them one by one
> > with
> > 

Re: Fallout from upgrading to kafka 0.9 from 0.8.2.3

2015-12-17 Thread Rajiv Kurian
I can't think of anything special about the topics besides the clients
being very old (Java wrappers over Scala).

I do think it was using ack=0. But my guess is that the logging was done by
the Kafka producer thread. My application itself was not getting exceptions
from Kafka.

On Thu, Dec 17, 2015 at 2:31 PM, Jun Rao  wrote:

> Hmm, anything special with those 3 topics? Also, the broker log shows that
> the producer uses ack=0, which means the producer shouldn't get errors like
> leader not found. Could you clarify on the ack used by the producer?
>
> Thanks,
>
> Jun
>
> On Thu, Dec 17, 2015 at 12:41 PM, Rajiv Kurian  wrote:
>
> > The topic which stopped working had clients that were only using the old
> > Java producer that is a wrapper over the Scala producer. Again it seemed
> to
> > work perfectly in another of our realms where we have the same topics,
> same
> > producers/consumers etc but with less traffic.
> >
> > On Thu, Dec 17, 2015 at 12:23 PM, Jun Rao  wrote:
> >
> > > Are you using the new java producer?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Dec 17, 2015 at 9:58 AM, Rajiv Kurian 
> > wrote:
> > >
> > > > Hi Jun,
> > > > Answers inline:
> > > >
> > > > On Thu, Dec 17, 2015 at 9:41 AM, Jun Rao  wrote:
> > > >
> > > > > Rajiv,
> > > > >
> > > > > Thanks for reporting this.
> > > > >
> > > > > 1. How did you verify that 3 of the topics are corrupted? Did you
> use
> > > > > DumpLogSegments tool? Also, is there a simple way to reproduce the
> > > > > corruption?
> > > > >
> > > > No I did not. The only reason I had to believe that was no writers
> > could
> > > > write to the topic. I have actually no idea what the problem was. I
> saw
> > > > very frequent (much more than usual) messages of the form:
> > > > INFO  [kafka-request-handler-2] [kafka.server.KafkaApis
> > > >   ]: [KafkaApi-6] Close connection due to error handling produce
> > > > request with correlation id 294218 from client id  with ack=0
> > > > and also message of the form:
> > > > INFO  [kafka-network-thread-9092-0] [kafka.network.Processor
> > > >   ]: Closing socket connection to /some ip
> > > > The cluster was actually a critical one so I had no recourse but to
> > > revert
> > > > the change (which like noted didn't fix things). I didn't have enough
> > > time
> > > > to debug further. The only way I could fix it with my limited Kafka
> > > > knowledge was (after reverting) deleting the topic and recreating it.
> > > > I had updated a low priority cluster before that worked just fine.
> That
> > > > gave me the confidence to upgrade this higher priority cluster which
> > did
> > > > NOT work out. So the only way for me to try to reproduce it is to try
> > > this
> > > > on our larger clusters again. But it is critical that we don't mess
> up
> > > this
> > > > high priority cluster so I am afraid to try again.
> > > >
> > > > > 2. As Lance mentioned, if you are using snappy, make sure that you
> > > > include
> > > > > the right snappy jar (1.1.1.7).
> > > > >
> > > > Wonder why I don't see Lance's email in this thread. Either way we
> are
> > > not
> > > > using compression of any kind on this topic.
> > > >
> > > > > 3. For the CPU issue, could you do a bit profiling to see which
> > thread
> > > is
> > > > > busy and where it's spending time?
> > > > >
> > > > Since I had to revert I didn't have the time to profile. Intuitively
> it
> > > > would seem like the high number of client disconnects/errors and the
> > > > increased network usage probably has something to do with the high
> CPU
> > > > (total guess). Again our other (lower traffic) cluster that was
> > upgraded
> > > > was totally fine so it doesn't seem like it happens all the time.
> > > >
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Tue, Dec 15, 2015 at 12:52 PM, Rajiv Kurian  >
> > > > wrote:
> > > > >
> > > > > > We had to revert to 0.8.3 because three of our topics seem to
> have
> > > > gotten
> > > > > > corrupted during the upgrade. As soon as we did the upgrade
> > producers
> > > > to
> > > > > > the three topics I mentioned stopped being able to do writes. The
> > > > clients
> > > > > > complained (occasionally) about leader not found exceptions. We
> > > > restarted
> > > > > > our clients and brokers but that didn't seem to help. Actually
> even
> > > > after
> > > > > > reverting to 0.8.3 these three topics were broken. To fix it we
> had
> > > to
> > > > > stop
> > > > > > all clients, delete the topics, create them again and then
> restart
> > > the
> > > > > > clients.
> > > > > >
> > > > > > I realize this is not a lot of info. I couldn't wait to get more
> > > debug
> > > > > info
> > > > > > because the cluster was actually being used. Has any one run into
> > > > > something
> > > > > > like this? Are there any known issues with old
> consumers/producers.
> > > The

Re: Kafka User Group meeting Link

2015-12-17 Thread prabhu v
I am put up in India.

Looking for the below user group meetings. I am able to access 2nd Kafka
User group meeting, but not 1st & 3rd.

User group meetings:

   - 1st Kafka user group meeting at LinkedIn, Jun. 14, 2012. video (part 1)
   , video (part 2)
   
   - 2nd Kafka user group meeting at LinkedIn, Jun 27 2013. video
   
   - 3rd Kafka user group meeting at LinkedIn, June 3, 2014, video
   



Regards,
Prabhu

On Thu, Dec 17, 2015 at 11:58 PM, Jens Rantil  wrote:

> Hi,
>
>
> In which part of the world?
>
>
>
>
> Cheers,
>
> Jens
>
>
>
>
>
> –
> Skickat från Mailbox
>
> On Thu, Dec 17, 2015 at 8:23 AM, prabhu v 
> wrote:
>
> > Hi,
> > Can anyone provide me the link for the KAFKA USER Group meetings which
> > happened on Jun. 14, 2012 and June 3, 2014??
> > Link provided in the below wiki page is not a valid one..
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations
> > --
> > Regards,
> > Prabhu.V
> > --
> > Regards,
> > Prabhu.V
>



-- 
Regards,

Prabhu.V


Re: Fallout from upgrading to kafka 0.9 from 0.8.2.3

2015-12-17 Thread Rajiv Kurian
I was mistaken about the version. We were actually using 0.8.2.1 before
upgrading to 0.9.

On Thu, Dec 17, 2015 at 6:13 PM, Dana Powers  wrote:

> I don't have much to add on this, but q: what is version 0.8.2.3? I thought
> the latest in 0.8 series was 0.8.2.2?
>
> -Dana
> On Dec 17, 2015 5:56 PM, "Rajiv Kurian"  wrote:
>
> > Yes we are in the process of upgrading to the new producers. But the
> > problem seems deeper than a compatibility issue. We have one environment
> > where the old producers work with the new 0.9 broker. Further when we
> > reverted our messed up 0.9 environment to 0.8.2.3 the problem with those
> > topics didn't go away.
> >
> > Didn't see any ZK issues on the brokers. There were other topics on the
> > very same brokers that didn't seem to be affected.
> >
> > On Thu, Dec 17, 2015 at 5:46 PM, Jun Rao  wrote:
> >
> > > Yes, the new java producer is available in 0.8.2.x and we recommend
> > people
> > > use that.
> > >
> > > Also, when those producers had the issue, were there any other things
> > weird
> > > in the broker (e.g., broker's ZK session expires)?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Dec 17, 2015 at 2:37 PM, Rajiv Kurian 
> > wrote:
> > >
> > > > I can't think of anything special about the topics besides the
> clients
> > > > being very old (Java wrappers over Scala).
> > > >
> > > > I do think it was using ack=0. But my guess is that the logging was
> > done
> > > by
> > > > the Kafka producer thread. My application itself was not getting
> > > exceptions
> > > > from Kafka.
> > > >
> > > > On Thu, Dec 17, 2015 at 2:31 PM, Jun Rao  wrote:
> > > >
> > > > > Hmm, anything special with those 3 topics? Also, the broker log
> shows
> > > > that
> > > > > the producer uses ack=0, which means the producer shouldn't get
> > errors
> > > > like
> > > > > leader not found. Could you clarify on the ack used by the
> producer?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Thu, Dec 17, 2015 at 12:41 PM, Rajiv Kurian  >
> > > > wrote:
> > > > >
> > > > > > The topic which stopped working had clients that were only using
> > the
> > > > old
> > > > > > Java producer that is a wrapper over the Scala producer. Again it
> > > > seemed
> > > > > to
> > > > > > work perfectly in another of our realms where we have the same
> > > topics,
> > > > > same
> > > > > > producers/consumers etc but with less traffic.
> > > > > >
> > > > > > On Thu, Dec 17, 2015 at 12:23 PM, Jun Rao 
> > wrote:
> > > > > >
> > > > > > > Are you using the new java producer?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Thu, Dec 17, 2015 at 9:58 AM, Rajiv Kurian <
> > ra...@signalfx.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > > Answers inline:
> > > > > > > >
> > > > > > > > On Thu, Dec 17, 2015 at 9:41 AM, Jun Rao 
> > > wrote:
> > > > > > > >
> > > > > > > > > Rajiv,
> > > > > > > > >
> > > > > > > > > Thanks for reporting this.
> > > > > > > > >
> > > > > > > > > 1. How did you verify that 3 of the topics are corrupted?
> Did
> > > you
> > > > > use
> > > > > > > > > DumpLogSegments tool? Also, is there a simple way to
> > reproduce
> > > > the
> > > > > > > > > corruption?
> > > > > > > > >
> > > > > > > > No I did not. The only reason I had to believe that was no
> > > writers
> > > > > > could
> > > > > > > > write to the topic. I have actually no idea what the problem
> > > was. I
> > > > > saw
> > > > > > > > very frequent (much more than usual) messages of the form:
> > > > > > > > INFO  [kafka-request-handler-2]
> > > [kafka.server.KafkaApis
> > > > > > > >   ]: [KafkaApi-6] Close connection due to error handling
> > > > produce
> > > > > > > > request with correlation id 294218 from client id  with ack=0
> > > > > > > > and also message of the form:
> > > > > > > > INFO  [kafka-network-thread-9092-0]
> > > > [kafka.network.Processor
> > > > > > > >   ]: Closing socket connection to /some ip
> > > > > > > > The cluster was actually a critical one so I had no recourse
> > but
> > > to
> > > > > > > revert
> > > > > > > > the change (which like noted didn't fix things). I didn't
> have
> > > > enough
> > > > > > > time
> > > > > > > > to debug further. The only way I could fix it with my limited
> > > Kafka
> > > > > > > > knowledge was (after reverting) deleting the topic and
> > recreating
> > > > it.
> > > > > > > > I had updated a low priority cluster before that worked just
> > > fine.
> > > > > That
> > > > > > > > gave me the confidence to upgrade this higher priority
> cluster
> > > > which
> > > > > > did
> > > > > > > > NOT work out. So the only way for me to try to reproduce it
> is
> > to
> > > > try
> > > > > > > this
> > > > > > > > on our larger clusters again. But it is critical 

Re: Where can I find the document for consumer metrics

2015-12-17 Thread Guozhang Wang
We should add a section for that. Siyuan can you file a JIRA?

Guozhang

On Thu, Dec 17, 2015 at 11:08 AM, hsy...@gmail.com  wrote:

> I can find some broker/producer metrics here
> http://kafka.apache.org/documentation.html#monitoring
>
> but where can I find consumer metrics docs
>
> Everytime I have to log this to find out what metrics I want
>
> MetricName [name=join-rate, group=consumer-coordinator-metrics,
> description=The number of group joins per second,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@ebea716
> ;MetricName
> [name=fetch-size-avg, group=consumer-fetch-manager-metrics, description=The
> average number of bytes fetched per request,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@6cb9cea
> ;MetricName
> [name=commit-latency-avg, group=consumer-coordinator-metrics,
> description=The average time taken for a commit request,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@21aaca22
> ;MetricName
> [name=join-time-avg, group=consumer-coordinator-metrics, description=The
> average time taken for a group rejoin,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@53bc8f72
> ;MetricName
> [name=incoming-byte-rate, group=consumer-metrics, description=Bytes/second
> read off all sockets,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@34f9c3e0
> ;MetricName
> [name=bytes-consumed-rate, group=consumer-fetch-manager-metrics,
> description=The average number of bytes consumed per second,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@36c7401a
> ;MetricName
> [name=response-rate, group=consumer-metrics, description=Responses received
> sent per second.,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@5341870e
> ;MetricName
> [name=connection-creation-rate, group=consumer-metrics, description=New
> connections established per second in the window.,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@9f0d8f4
> ;MetricName
> [name=fetch-rate, group=consumer-fetch-manager-metrics, description=The
> number of fetch requests per second.,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@23338045
> ;MetricName
> [name=join-time-max, group=consumer-coordinator-metrics, description=The
> max time taken for a group rejoin,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@5cdabd4d
> ;MetricName
> [name=io-wait-ratio, group=consumer-metrics, description=The fraction of
> time the I/O thread spent waiting.,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@659b7186
> ;MetricName
> [name=fetch-size-max, group=consumer-fetch-manager-metrics, description=The
> maximum number of bytes fetched per request,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@403a4887
> ;MetricName
> [name=assigned-partitions, group=consumer-coordinator-metrics,
> description=The number of partitions currently assigned to this consumer,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@326fb802
> ;MetricName
> [name=io-time-ns-avg, group=consumer-metrics, description=The average
> length of time for I/O per select call in nanoseconds.,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@432b0ee3
> ;MetricName
> [name=records-consumed-rate, group=consumer-fetch-manager-metrics,
> description=The average number of records consumed per second,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@3fde7b88
> ;MetricName
> [name=io-wait-time-ns-avg, group=consumer-metrics, description=The average
> length of time the I/O thread spent waiting for a socket ready for reads or
> writes in nanoseconds.,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@5485cfd8
> ;MetricName
> [name=select-rate, group=consumer-metrics, description=Number of times the
> I/O layer checked for new I/O to perform per second,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@2cbdcaf6
> ;MetricName
> [name=fetch-throttle-time-max, group=consumer-fetch-manager-metrics,
> description=The maximum throttle time in ms,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@6057f36c
> ;MetricName
> [name=heartbeat-response-time-max, group=consumer-coordinator-metrics,
> description=The max time taken to receive a response to a heartbeat
> request,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@2e2e68de
> ;MetricName
> [name=network-io-rate, group=consumer-metrics, description=The average
> number of network operations (reads or writes) on all connections per
> second.,
>
> tags={client-id=consumer-4}]=org.apache.kafka.common.metrics.KafkaMetric@68e6de81
> ;MetricName
> [name=fetch-latency-max, group=consumer-fetch-manager-metrics,
> description=The 

Local Storage

2015-12-17 Thread Heath Ivie
Maybe someone can answer this question for, because I cannot seem to find it.

What is the data store that Kafka uses when it writes the logs to disk?

I thought I saw a reference to KahaDB, but I am not sure if that is correct.


Heath Ivie
Solutions Architect


Warning: This e-mail may contain information proprietary to AutoAnything Inc. 
and is intended only for the use of the intended recipient(s). If the reader of 
this message is not the intended recipient(s), you have received this message 
in error and any review, dissemination, distribution or copying of this message 
is strictly prohibited. If you have received this message in error, please 
notify the sender immediately and delete all copies.


Re: Local Storage

2015-12-17 Thread Gwen Shapira
Hi,

Kafka *is* a data store. It writes data to files on the OS file system. One
directory per partition, and a new file every specific amount of time (you
can control this with log.roll.ms). The data format is specific to Kafka.

Hope this helps,

Gwen

On Thu, Dec 17, 2015 at 3:32 PM, Heath Ivie  wrote:

> Maybe someone can answer this question for, because I cannot seem to find
> it.
>
> What is the data store that Kafka uses when it writes the logs to disk?
>
> I thought I saw a reference to KahaDB, but I am not sure if that is
> correct.
>
>
> Heath Ivie
> Solutions Architect
>
>
> Warning: This e-mail may contain information proprietary to AutoAnything
> Inc. and is intended only for the use of the intended recipient(s). If the
> reader of this message is not the intended recipient(s), you have received
> this message in error and any review, dissemination, distribution or
> copying of this message is strictly prohibited. If you have received this
> message in error, please notify the sender immediately and delete all
> copies.
>


Re: failed with LeaderNotAvailableError -

2015-12-17 Thread Marko Bonaći
It doesn't have to be FQDN.

Here's how I run Kafka in a container:
docker run --name st-kafka -p 2181:2181 -p 9092:9092 -e
ADVERTISED_HOST=`docker-machine ip dev-st` -e ADVERTISED_PORT=9092 -d
spotify/kafka

And then you have access to Kafka on the docker host VM from any other
machine.
BTW I use Spotify's image since it contains both ZK and Kafka, but I think
the latest version they built is 0.8.2.1, so you might have to build the
new image yourself if you need 0.9, but that's trivial to do.

Marko Bonaći
Monitoring | Alerting | Anomaly Detection | Centralized Log Management
Solr & Elasticsearch Support
Sematext  | Contact


On Thu, Dec 17, 2015 at 11:33 AM, Ben Davison 
wrote:

> Hi David,
>
> Are you running in docker? Are you trying to connect from to a remote box?
> We found we could connect locally but couldn't connect from another remote
> host.
>
> (I've just started using kafka also)
>
> We had the same issue and found out: host.name=<%=@ipaddress%> needed to
> be
> the FQDN of the box.
>
> Thanks,
>
> Ben
>
> On Thu, Dec 17, 2015 at 5:40 AM, David Montgomery <
> davidmontgom...@gmail.com
> > wrote:
>
> > Hi,
> >
> > I am very concerned about using kafka in production given the below
> > errors:
> >
> > Now issues with myt zookeeper.  Other services use ZK.  Only kafka fails.
> > I have 2 kafka servers using 8.x.  How do I resolve?  I tried restarting
> > services for kafka.  Below is my kafka server.properties file
> >
> > 'Traceback (most recent call last):
> >   File
> >
> >
> "/usr/local/lib/python2.7/dist-packages/gevent-1.1b6-py2.7-linux-x86_64.egg/gevent/greenlet.py",
> > line 523, in run
> > result = self._run(*self.args, **self.kwargs)
> >   File "/var/feed-server/ad-server/pixel-server.py", line 145, in
> > send_kafka_message
> > res = producer.send_messages(topic, message)
> >   File "build/bdist.linux-x86_64/egg/kafka/producer/simple.py", line 52,
> in
> > send_messages
> > partition = self._next_partition(topic)
> >   File "build/bdist.linux-x86_64/egg/kafka/producer/simple.py", line 36,
> in
> > _next_partition
> > self.client.load_metadata_for_topics(topic)
> >   File "build/bdist.linux-x86_64/egg/kafka/client.py", line 383, in
> > load_metadata_for_topics
> > kafka.common.check_error(topic_metadata)
> >   File "build/bdist.linux-x86_64/egg/kafka/common.py", line 233, in
> > check_error
> > raise error_class(response)
> > LeaderNotAvailableError: TopicMetadata(topic='topic-test-production',
> > error=5, partitions=[])
> >  > '{"adfadfadf)> failed with LeaderNotAvailableError
> >
> >
> >
> >
> >
> >
> >
> >
> > # limitations under the License.
> > # see kafka.server.KafkaConfig for additional details and defaults
> >
> > # Server Basics #
> >
> > # The id of the broker. This must be set to a unique integer for each
> > broker.
> > broker.id=<%=@broker_id%>
> > advertised.host.name=<%=@ipaddress%>
> > advertised.port=9092
> > # Socket Server Settings
> > #
> >
> > # The port the socket server listens on
> > port=9092
> >
> > # Hostname the broker will bind to and advertise to producers and
> > consumers.
> > # If not set, the server will bind to all interfaces and advertise the
> > value returned from
> > # from java.net.InetAddress.getCanonicalHostName().
> > host.name=<%=@ipaddress%>
> >
> > # The number of threads handling network requests
> > num.network.threads=2
> >
> > # The number of threads doing disk I/O
> > num.io.threads=2
> >
> > # The send buffer (SO_SNDBUF) used by the socket server
> > socket.send.buffer.bytes=1048576
> >
> > # The receive buffer (SO_RCVBUF) used by the socket server
> > socket.receive.buffer.bytes=1048576
> >
> > # The maximum size of a request that the socket server will accept
> > (protection against OOM)
> > socket.request.max.bytes=104857600
> >
> >
> > # Log Basics #
> >
> > # A comma seperated list of directories under which to store log files
> > log.dirs=/tmp/kafka-logs
> >
> > # The number of logical partitions per topic per server. More partitions
> > allow greater parallelism
> > # for consumption, but also mean more files.
> > num.partitions=2
> >
> > # Log Flush Policy
> > #
> >
> > # The following configurations control the flush of data to disk. This is
> > among the most
> > # important performance knob in kafka.
> > # There are a few important trade-offs here:
> > #1. Durability: Unflushed data may be lost if you are not using
> > replication.
> > #2. Latency: Very large flush intervals may lead to latency spikes
> when
> > the flush does occur as there will be a lot of data to flush.
> > #3. Throughput: The flush is generally the most expensive operation,
> > and a small flush interval may 

Re: Fallout from upgrading to kafka 0.9 from 0.8.2.3

2015-12-17 Thread Jun Rao
Yes, the new java producer is available in 0.8.2.x and we recommend people
use that.

Also, when those producers had the issue, were there any other things weird
in the broker (e.g., broker's ZK session expires)?

Thanks,

Jun

On Thu, Dec 17, 2015 at 2:37 PM, Rajiv Kurian  wrote:

> I can't think of anything special about the topics besides the clients
> being very old (Java wrappers over Scala).
>
> I do think it was using ack=0. But my guess is that the logging was done by
> the Kafka producer thread. My application itself was not getting exceptions
> from Kafka.
>
> On Thu, Dec 17, 2015 at 2:31 PM, Jun Rao  wrote:
>
> > Hmm, anything special with those 3 topics? Also, the broker log shows
> that
> > the producer uses ack=0, which means the producer shouldn't get errors
> like
> > leader not found. Could you clarify on the ack used by the producer?
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Dec 17, 2015 at 12:41 PM, Rajiv Kurian 
> wrote:
> >
> > > The topic which stopped working had clients that were only using the
> old
> > > Java producer that is a wrapper over the Scala producer. Again it
> seemed
> > to
> > > work perfectly in another of our realms where we have the same topics,
> > same
> > > producers/consumers etc but with less traffic.
> > >
> > > On Thu, Dec 17, 2015 at 12:23 PM, Jun Rao  wrote:
> > >
> > > > Are you using the new java producer?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Thu, Dec 17, 2015 at 9:58 AM, Rajiv Kurian 
> > > wrote:
> > > >
> > > > > Hi Jun,
> > > > > Answers inline:
> > > > >
> > > > > On Thu, Dec 17, 2015 at 9:41 AM, Jun Rao  wrote:
> > > > >
> > > > > > Rajiv,
> > > > > >
> > > > > > Thanks for reporting this.
> > > > > >
> > > > > > 1. How did you verify that 3 of the topics are corrupted? Did you
> > use
> > > > > > DumpLogSegments tool? Also, is there a simple way to reproduce
> the
> > > > > > corruption?
> > > > > >
> > > > > No I did not. The only reason I had to believe that was no writers
> > > could
> > > > > write to the topic. I have actually no idea what the problem was. I
> > saw
> > > > > very frequent (much more than usual) messages of the form:
> > > > > INFO  [kafka-request-handler-2] [kafka.server.KafkaApis
> > > > >   ]: [KafkaApi-6] Close connection due to error handling
> produce
> > > > > request with correlation id 294218 from client id  with ack=0
> > > > > and also message of the form:
> > > > > INFO  [kafka-network-thread-9092-0]
> [kafka.network.Processor
> > > > >   ]: Closing socket connection to /some ip
> > > > > The cluster was actually a critical one so I had no recourse but to
> > > > revert
> > > > > the change (which like noted didn't fix things). I didn't have
> enough
> > > > time
> > > > > to debug further. The only way I could fix it with my limited Kafka
> > > > > knowledge was (after reverting) deleting the topic and recreating
> it.
> > > > > I had updated a low priority cluster before that worked just fine.
> > That
> > > > > gave me the confidence to upgrade this higher priority cluster
> which
> > > did
> > > > > NOT work out. So the only way for me to try to reproduce it is to
> try
> > > > this
> > > > > on our larger clusters again. But it is critical that we don't mess
> > up
> > > > this
> > > > > high priority cluster so I am afraid to try again.
> > > > >
> > > > > > 2. As Lance mentioned, if you are using snappy, make sure that
> you
> > > > > include
> > > > > > the right snappy jar (1.1.1.7).
> > > > > >
> > > > > Wonder why I don't see Lance's email in this thread. Either way we
> > are
> > > > not
> > > > > using compression of any kind on this topic.
> > > > >
> > > > > > 3. For the CPU issue, could you do a bit profiling to see which
> > > thread
> > > > is
> > > > > > busy and where it's spending time?
> > > > > >
> > > > > Since I had to revert I didn't have the time to profile.
> Intuitively
> > it
> > > > > would seem like the high number of client disconnects/errors and
> the
> > > > > increased network usage probably has something to do with the high
> > CPU
> > > > > (total guess). Again our other (lower traffic) cluster that was
> > > upgraded
> > > > > was totally fine so it doesn't seem like it happens all the time.
> > > > >
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Tue, Dec 15, 2015 at 12:52 PM, Rajiv Kurian <
> ra...@signalfx.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > We had to revert to 0.8.3 because three of our topics seem to
> > have
> > > > > gotten
> > > > > > > corrupted during the upgrade. As soon as we did the upgrade
> > > producers
> > > > > to
> > > > > > > the three topics I mentioned stopped being able to do writes.
> The
> > > > > clients
> > > > > > > complained (occasionally) about leader not found exceptions. We
> > > > > restarted
> > > > > > > our clients and brokers but 

Re: Fallout from upgrading to kafka 0.9 from 0.8.2.3

2015-12-17 Thread Rajiv Kurian
Yes we are in the process of upgrading to the new producers. But the
problem seems deeper than a compatibility issue. We have one environment
where the old producers work with the new 0.9 broker. Further when we
reverted our messed up 0.9 environment to 0.8.2.3 the problem with those
topics didn't go away.

Didn't see any ZK issues on the brokers. There were other topics on the
very same brokers that didn't seem to be affected.

On Thu, Dec 17, 2015 at 5:46 PM, Jun Rao  wrote:

> Yes, the new java producer is available in 0.8.2.x and we recommend people
> use that.
>
> Also, when those producers had the issue, were there any other things weird
> in the broker (e.g., broker's ZK session expires)?
>
> Thanks,
>
> Jun
>
> On Thu, Dec 17, 2015 at 2:37 PM, Rajiv Kurian  wrote:
>
> > I can't think of anything special about the topics besides the clients
> > being very old (Java wrappers over Scala).
> >
> > I do think it was using ack=0. But my guess is that the logging was done
> by
> > the Kafka producer thread. My application itself was not getting
> exceptions
> > from Kafka.
> >
> > On Thu, Dec 17, 2015 at 2:31 PM, Jun Rao  wrote:
> >
> > > Hmm, anything special with those 3 topics? Also, the broker log shows
> > that
> > > the producer uses ack=0, which means the producer shouldn't get errors
> > like
> > > leader not found. Could you clarify on the ack used by the producer?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Dec 17, 2015 at 12:41 PM, Rajiv Kurian 
> > wrote:
> > >
> > > > The topic which stopped working had clients that were only using the
> > old
> > > > Java producer that is a wrapper over the Scala producer. Again it
> > seemed
> > > to
> > > > work perfectly in another of our realms where we have the same
> topics,
> > > same
> > > > producers/consumers etc but with less traffic.
> > > >
> > > > On Thu, Dec 17, 2015 at 12:23 PM, Jun Rao  wrote:
> > > >
> > > > > Are you using the new java producer?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Thu, Dec 17, 2015 at 9:58 AM, Rajiv Kurian 
> > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > > Answers inline:
> > > > > >
> > > > > > On Thu, Dec 17, 2015 at 9:41 AM, Jun Rao 
> wrote:
> > > > > >
> > > > > > > Rajiv,
> > > > > > >
> > > > > > > Thanks for reporting this.
> > > > > > >
> > > > > > > 1. How did you verify that 3 of the topics are corrupted? Did
> you
> > > use
> > > > > > > DumpLogSegments tool? Also, is there a simple way to reproduce
> > the
> > > > > > > corruption?
> > > > > > >
> > > > > > No I did not. The only reason I had to believe that was no
> writers
> > > > could
> > > > > > write to the topic. I have actually no idea what the problem
> was. I
> > > saw
> > > > > > very frequent (much more than usual) messages of the form:
> > > > > > INFO  [kafka-request-handler-2]
> [kafka.server.KafkaApis
> > > > > >   ]: [KafkaApi-6] Close connection due to error handling
> > produce
> > > > > > request with correlation id 294218 from client id  with ack=0
> > > > > > and also message of the form:
> > > > > > INFO  [kafka-network-thread-9092-0]
> > [kafka.network.Processor
> > > > > >   ]: Closing socket connection to /some ip
> > > > > > The cluster was actually a critical one so I had no recourse but
> to
> > > > > revert
> > > > > > the change (which like noted didn't fix things). I didn't have
> > enough
> > > > > time
> > > > > > to debug further. The only way I could fix it with my limited
> Kafka
> > > > > > knowledge was (after reverting) deleting the topic and recreating
> > it.
> > > > > > I had updated a low priority cluster before that worked just
> fine.
> > > That
> > > > > > gave me the confidence to upgrade this higher priority cluster
> > which
> > > > did
> > > > > > NOT work out. So the only way for me to try to reproduce it is to
> > try
> > > > > this
> > > > > > on our larger clusters again. But it is critical that we don't
> mess
> > > up
> > > > > this
> > > > > > high priority cluster so I am afraid to try again.
> > > > > >
> > > > > > > 2. As Lance mentioned, if you are using snappy, make sure that
> > you
> > > > > > include
> > > > > > > the right snappy jar (1.1.1.7).
> > > > > > >
> > > > > > Wonder why I don't see Lance's email in this thread. Either way
> we
> > > are
> > > > > not
> > > > > > using compression of any kind on this topic.
> > > > > >
> > > > > > > 3. For the CPU issue, could you do a bit profiling to see which
> > > > thread
> > > > > is
> > > > > > > busy and where it's spending time?
> > > > > > >
> > > > > > Since I had to revert I didn't have the time to profile.
> > Intuitively
> > > it
> > > > > > would seem like the high number of client disconnects/errors and
> > the
> > > > > > increased network usage probably has something to do with the
> high
> > > CPU
> > > > > > 

Re: failed with LeaderNotAvailableError -

2015-12-17 Thread David Montgomery
FYI I am using digitialocean.  I do not use docker.

On Thu, Dec 17, 2015 at 6:33 PM, Ben Davison 
wrote:

> Hi David,
>
> Are you running in docker? Are you trying to connect from to a remote box?
> We found we could connect locally but couldn't connect from another remote
> host.
>
> (I've just started using kafka also)
>
> We had the same issue and found out: host.name=<%=@ipaddress%> needed to
> be
> the FQDN of the box.
>
> Thanks,
>
> Ben
>
> On Thu, Dec 17, 2015 at 5:40 AM, David Montgomery <
> davidmontgom...@gmail.com
> > wrote:
>
> > Hi,
> >
> > I am very concerned about using kafka in production given the below
> > errors:
> >
> > Now issues with myt zookeeper.  Other services use ZK.  Only kafka fails.
> > I have 2 kafka servers using 8.x.  How do I resolve?  I tried restarting
> > services for kafka.  Below is my kafka server.properties file
> >
> > 'Traceback (most recent call last):
> >   File
> >
> >
> "/usr/local/lib/python2.7/dist-packages/gevent-1.1b6-py2.7-linux-x86_64.egg/gevent/greenlet.py",
> > line 523, in run
> > result = self._run(*self.args, **self.kwargs)
> >   File "/var/feed-server/ad-server/pixel-server.py", line 145, in
> > send_kafka_message
> > res = producer.send_messages(topic, message)
> >   File "build/bdist.linux-x86_64/egg/kafka/producer/simple.py", line 52,
> in
> > send_messages
> > partition = self._next_partition(topic)
> >   File "build/bdist.linux-x86_64/egg/kafka/producer/simple.py", line 36,
> in
> > _next_partition
> > self.client.load_metadata_for_topics(topic)
> >   File "build/bdist.linux-x86_64/egg/kafka/client.py", line 383, in
> > load_metadata_for_topics
> > kafka.common.check_error(topic_metadata)
> >   File "build/bdist.linux-x86_64/egg/kafka/common.py", line 233, in
> > check_error
> > raise error_class(response)
> > LeaderNotAvailableError: TopicMetadata(topic='topic-test-production',
> > error=5, partitions=[])
> >  > '{"adfadfadf)> failed with LeaderNotAvailableError
> >
> >
> >
> >
> >
> >
> >
> >
> > # limitations under the License.
> > # see kafka.server.KafkaConfig for additional details and defaults
> >
> > # Server Basics #
> >
> > # The id of the broker. This must be set to a unique integer for each
> > broker.
> > broker.id=<%=@broker_id%>
> > advertised.host.name=<%=@ipaddress%>
> > advertised.port=9092
> > # Socket Server Settings
> > #
> >
> > # The port the socket server listens on
> > port=9092
> >
> > # Hostname the broker will bind to and advertise to producers and
> > consumers.
> > # If not set, the server will bind to all interfaces and advertise the
> > value returned from
> > # from java.net.InetAddress.getCanonicalHostName().
> > host.name=<%=@ipaddress%>
> >
> > # The number of threads handling network requests
> > num.network.threads=2
> >
> > # The number of threads doing disk I/O
> > num.io.threads=2
> >
> > # The send buffer (SO_SNDBUF) used by the socket server
> > socket.send.buffer.bytes=1048576
> >
> > # The receive buffer (SO_RCVBUF) used by the socket server
> > socket.receive.buffer.bytes=1048576
> >
> > # The maximum size of a request that the socket server will accept
> > (protection against OOM)
> > socket.request.max.bytes=104857600
> >
> >
> > # Log Basics #
> >
> > # A comma seperated list of directories under which to store log files
> > log.dirs=/tmp/kafka-logs
> >
> > # The number of logical partitions per topic per server. More partitions
> > allow greater parallelism
> > # for consumption, but also mean more files.
> > num.partitions=2
> >
> > # Log Flush Policy
> > #
> >
> > # The following configurations control the flush of data to disk. This is
> > among the most
> > # important performance knob in kafka.
> > # There are a few important trade-offs here:
> > #1. Durability: Unflushed data may be lost if you are not using
> > replication.
> > #2. Latency: Very large flush intervals may lead to latency spikes
> when
> > the flush does occur as there will be a lot of data to flush.
> > #3. Throughput: The flush is generally the most expensive operation,
> > and a small flush interval may lead to exceessive seeks.
> > # The settings below allow one to configure the flush policy to flush
> data
> > after a period of time or
> > # every N messages (or both). This can be done globally and overridden
> on a
> > per-topic basis.
> >
> > # The number of messages to accept before forcing a flush of data to disk
> > log.flush.interval.messages=1
> >
> > # The maximum amount of time a message can sit in a log before we force a
> > flush
> > log.flush.interval.ms=1000
> >
> > # Per-topic overrides for log.flush.interval.ms
> > #log.flush.intervals.ms.per.topic=topic1:1000, topic2:3000
> >
> > # Log 

Re: Fallout from upgrading to kafka 0.9 from 0.8.2.3

2015-12-17 Thread Dana Powers
I don't have much to add on this, but q: what is version 0.8.2.3? I thought
the latest in 0.8 series was 0.8.2.2?

-Dana
On Dec 17, 2015 5:56 PM, "Rajiv Kurian"  wrote:

> Yes we are in the process of upgrading to the new producers. But the
> problem seems deeper than a compatibility issue. We have one environment
> where the old producers work with the new 0.9 broker. Further when we
> reverted our messed up 0.9 environment to 0.8.2.3 the problem with those
> topics didn't go away.
>
> Didn't see any ZK issues on the brokers. There were other topics on the
> very same brokers that didn't seem to be affected.
>
> On Thu, Dec 17, 2015 at 5:46 PM, Jun Rao  wrote:
>
> > Yes, the new java producer is available in 0.8.2.x and we recommend
> people
> > use that.
> >
> > Also, when those producers had the issue, were there any other things
> weird
> > in the broker (e.g., broker's ZK session expires)?
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Dec 17, 2015 at 2:37 PM, Rajiv Kurian 
> wrote:
> >
> > > I can't think of anything special about the topics besides the clients
> > > being very old (Java wrappers over Scala).
> > >
> > > I do think it was using ack=0. But my guess is that the logging was
> done
> > by
> > > the Kafka producer thread. My application itself was not getting
> > exceptions
> > > from Kafka.
> > >
> > > On Thu, Dec 17, 2015 at 2:31 PM, Jun Rao  wrote:
> > >
> > > > Hmm, anything special with those 3 topics? Also, the broker log shows
> > > that
> > > > the producer uses ack=0, which means the producer shouldn't get
> errors
> > > like
> > > > leader not found. Could you clarify on the ack used by the producer?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Thu, Dec 17, 2015 at 12:41 PM, Rajiv Kurian 
> > > wrote:
> > > >
> > > > > The topic which stopped working had clients that were only using
> the
> > > old
> > > > > Java producer that is a wrapper over the Scala producer. Again it
> > > seemed
> > > > to
> > > > > work perfectly in another of our realms where we have the same
> > topics,
> > > > same
> > > > > producers/consumers etc but with less traffic.
> > > > >
> > > > > On Thu, Dec 17, 2015 at 12:23 PM, Jun Rao 
> wrote:
> > > > >
> > > > > > Are you using the new java producer?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Thu, Dec 17, 2015 at 9:58 AM, Rajiv Kurian <
> ra...@signalfx.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > > Answers inline:
> > > > > > >
> > > > > > > On Thu, Dec 17, 2015 at 9:41 AM, Jun Rao 
> > wrote:
> > > > > > >
> > > > > > > > Rajiv,
> > > > > > > >
> > > > > > > > Thanks for reporting this.
> > > > > > > >
> > > > > > > > 1. How did you verify that 3 of the topics are corrupted? Did
> > you
> > > > use
> > > > > > > > DumpLogSegments tool? Also, is there a simple way to
> reproduce
> > > the
> > > > > > > > corruption?
> > > > > > > >
> > > > > > > No I did not. The only reason I had to believe that was no
> > writers
> > > > > could
> > > > > > > write to the topic. I have actually no idea what the problem
> > was. I
> > > > saw
> > > > > > > very frequent (much more than usual) messages of the form:
> > > > > > > INFO  [kafka-request-handler-2]
> > [kafka.server.KafkaApis
> > > > > > >   ]: [KafkaApi-6] Close connection due to error handling
> > > produce
> > > > > > > request with correlation id 294218 from client id  with ack=0
> > > > > > > and also message of the form:
> > > > > > > INFO  [kafka-network-thread-9092-0]
> > > [kafka.network.Processor
> > > > > > >   ]: Closing socket connection to /some ip
> > > > > > > The cluster was actually a critical one so I had no recourse
> but
> > to
> > > > > > revert
> > > > > > > the change (which like noted didn't fix things). I didn't have
> > > enough
> > > > > > time
> > > > > > > to debug further. The only way I could fix it with my limited
> > Kafka
> > > > > > > knowledge was (after reverting) deleting the topic and
> recreating
> > > it.
> > > > > > > I had updated a low priority cluster before that worked just
> > fine.
> > > > That
> > > > > > > gave me the confidence to upgrade this higher priority cluster
> > > which
> > > > > did
> > > > > > > NOT work out. So the only way for me to try to reproduce it is
> to
> > > try
> > > > > > this
> > > > > > > on our larger clusters again. But it is critical that we don't
> > mess
> > > > up
> > > > > > this
> > > > > > > high priority cluster so I am afraid to try again.
> > > > > > >
> > > > > > > > 2. As Lance mentioned, if you are using snappy, make sure
> that
> > > you
> > > > > > > include
> > > > > > > > the right snappy jar (1.1.1.7).
> > > > > > > >
> > > > > > > Wonder why I don't see Lance's email in this thread. Either way
> > we
> > > > are
> > > > > > not
> > > > > > > using compression of any 

Re: failed with LeaderNotAvailableError -

2015-12-17 Thread David Montgomery
So what do I do?  Kill my production servers and rebuild?  Restarting all
services does nit work.  This seems kinda extreme.  At this point I feel I
have to kill all servers and rebuild.

Thanks

On Fri, Dec 18, 2015 at 2:28 AM, Dana Powers  wrote:

> Hi Ben and Marko -- great suggestions re: connection failures and docker.
>
> The specific error here is: LeaderNotAvailableError:
> TopicMetadata(topic='topic-test-production', error=5, partitions=[])
>
> That is an error code (5) returned from a MetadataRequest. In this context
> it means that the topic did not exist and so the request triggered an
> auto-create initialization (i.e., the connection was fine). Topic
> initialization tends to take a few seconds to complete, but only needs to
> happen once per topic. A retry here is generally fine. This retry should
> probably be handled under the covers by the client code. So in this case I
> would treat it as a simple kafka-python issue (#488).
>
> -Dana
>
> On Thu, Dec 17, 2015 at 4:58 AM, Ben Davison 
> wrote:
>
> > I probably should of mentioned that this was using Amazon ECS.
> >
> > On Thu, Dec 17, 2015 at 12:18 PM, Marko Bonaći <
> marko.bon...@sematext.com>
> > wrote:
> >
> > > It doesn't have to be FQDN.
> > >
> > > Here's how I run Kafka in a container:
> > > docker run --name st-kafka -p 2181:2181 -p 9092:9092 -e
> > > ADVERTISED_HOST=`docker-machine ip dev-st` -e ADVERTISED_PORT=9092 -d
> > > spotify/kafka
> > >
> > > And then you have access to Kafka on the docker host VM from any other
> > > machine.
> > > BTW I use Spotify's image since it contains both ZK and Kafka, but I
> > think
> > > the latest version they built is 0.8.2.1, so you might have to build
> the
> > > new image yourself if you need 0.9, but that's trivial to do.
> > >
> > > Marko Bonaći
> > > Monitoring | Alerting | Anomaly Detection | Centralized Log Management
> > > Solr & Elasticsearch Support
> > > Sematext  | Contact
> > > 
> > >
> > > On Thu, Dec 17, 2015 at 11:33 AM, Ben Davison <
> ben.davi...@7digital.com>
> > > wrote:
> > >
> > > > Hi David,
> > > >
> > > > Are you running in docker? Are you trying to connect from to a remote
> > > box?
> > > > We found we could connect locally but couldn't connect from another
> > > remote
> > > > host.
> > > >
> > > > (I've just started using kafka also)
> > > >
> > > > We had the same issue and found out: host.name=<%=@ipaddress%>
> needed
> > to
> > > > be
> > > > the FQDN of the box.
> > > >
> > > > Thanks,
> > > >
> > > > Ben
> > > >
> > > > On Thu, Dec 17, 2015 at 5:40 AM, David Montgomery <
> > > > davidmontgom...@gmail.com
> > > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am very concerned about using kafka in production given the below
> > > > > errors:
> > > > >
> > > > > Now issues with myt zookeeper.  Other services use ZK.  Only kafka
> > > fails.
> > > > > I have 2 kafka servers using 8.x.  How do I resolve?  I tried
> > > restarting
> > > > > services for kafka.  Below is my kafka server.properties file
> > > > >
> > > > > 'Traceback (most recent call last):
> > > > >   File
> > > > >
> > > > >
> > > >
> > >
> >
> "/usr/local/lib/python2.7/dist-packages/gevent-1.1b6-py2.7-linux-x86_64.egg/gevent/greenlet.py",
> > > > > line 523, in run
> > > > > result = self._run(*self.args, **self.kwargs)
> > > > >   File "/var/feed-server/ad-server/pixel-server.py", line 145, in
> > > > > send_kafka_message
> > > > > res = producer.send_messages(topic, message)
> > > > >   File "build/bdist.linux-x86_64/egg/kafka/producer/simple.py",
> line
> > > 52,
> > > > in
> > > > > send_messages
> > > > > partition = self._next_partition(topic)
> > > > >   File "build/bdist.linux-x86_64/egg/kafka/producer/simple.py",
> line
> > > 36,
> > > > in
> > > > > _next_partition
> > > > > self.client.load_metadata_for_topics(topic)
> > > > >   File "build/bdist.linux-x86_64/egg/kafka/client.py", line 383, in
> > > > > load_metadata_for_topics
> > > > > kafka.common.check_error(topic_metadata)
> > > > >   File "build/bdist.linux-x86_64/egg/kafka/common.py", line 233, in
> > > > > check_error
> > > > > raise error_class(response)
> > > > > LeaderNotAvailableError:
> TopicMetadata(topic='topic-test-production',
> > > > > error=5, partitions=[])
> > > > >  > > send_kafka_message('topic-test-production',
> > > > > '{"adfadfadf)> failed with LeaderNotAvailableError
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > # limitations under the License.
> > > > > # see kafka.server.KafkaConfig for additional details and defaults
> > > > >
> > > > > # Server Basics
> > > #
> > > > >
> > > > > # The id of the broker. This must be set to a unique integer for
> each
> > > > > broker.
> > > > > broker.id=<%=@broker_id%>
> > > > > advertised.host.name=<%=@ipaddress%>
> > > > > 

Re: failed with LeaderNotAvailableError -

2015-12-17 Thread Dana Powers
No - sorry if I wasn't clear. The "error" is not related to your servers.
Just make sure that you create the topic before sending messages to it. If
you do not create the topic beforehand, the server will auto-create, but it
does take a few seconds to initialize. If you plan to rely on auto-creation
in production, I recommend either wrapping producer.send_messages in a try
/ except and retrying on LeaderNotAvailableError, or alternatively calling
client.ensure_topic_exists (topic) to block until initialization is done.

Also I'll try to get out a new release of kafka-python soon that handles
the retry internally.

-Dana
On Dec 17, 2015 6:17 PM, "David Montgomery" 
wrote:

> So what do I do?  Kill my production servers and rebuild?  Restarting all
> services does nit work.  This seems kinda extreme.  At this point I feel I
> have to kill all servers and rebuild.
>
> Thanks
>
> On Fri, Dec 18, 2015 at 2:28 AM, Dana Powers 
> wrote:
>
> > Hi Ben and Marko -- great suggestions re: connection failures and docker.
> >
> > The specific error here is: LeaderNotAvailableError:
> > TopicMetadata(topic='topic-test-production', error=5, partitions=[])
> >
> > That is an error code (5) returned from a MetadataRequest. In this
> context
> > it means that the topic did not exist and so the request triggered an
> > auto-create initialization (i.e., the connection was fine). Topic
> > initialization tends to take a few seconds to complete, but only needs to
> > happen once per topic. A retry here is generally fine. This retry should
> > probably be handled under the covers by the client code. So in this case
> I
> > would treat it as a simple kafka-python issue (#488).
> >
> > -Dana
> >
> > On Thu, Dec 17, 2015 at 4:58 AM, Ben Davison 
> > wrote:
> >
> > > I probably should of mentioned that this was using Amazon ECS.
> > >
> > > On Thu, Dec 17, 2015 at 12:18 PM, Marko Bonaći <
> > marko.bon...@sematext.com>
> > > wrote:
> > >
> > > > It doesn't have to be FQDN.
> > > >
> > > > Here's how I run Kafka in a container:
> > > > docker run --name st-kafka -p 2181:2181 -p 9092:9092 -e
> > > > ADVERTISED_HOST=`docker-machine ip dev-st` -e ADVERTISED_PORT=9092 -d
> > > > spotify/kafka
> > > >
> > > > And then you have access to Kafka on the docker host VM from any
> other
> > > > machine.
> > > > BTW I use Spotify's image since it contains both ZK and Kafka, but I
> > > think
> > > > the latest version they built is 0.8.2.1, so you might have to build
> > the
> > > > new image yourself if you need 0.9, but that's trivial to do.
> > > >
> > > > Marko Bonaći
> > > > Monitoring | Alerting | Anomaly Detection | Centralized Log
> Management
> > > > Solr & Elasticsearch Support
> > > > Sematext  | Contact
> > > > 
> > > >
> > > > On Thu, Dec 17, 2015 at 11:33 AM, Ben Davison <
> > ben.davi...@7digital.com>
> > > > wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > > Are you running in docker? Are you trying to connect from to a
> remote
> > > > box?
> > > > > We found we could connect locally but couldn't connect from another
> > > > remote
> > > > > host.
> > > > >
> > > > > (I've just started using kafka also)
> > > > >
> > > > > We had the same issue and found out: host.name=<%=@ipaddress%>
> > needed
> > > to
> > > > > be
> > > > > the FQDN of the box.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Ben
> > > > >
> > > > > On Thu, Dec 17, 2015 at 5:40 AM, David Montgomery <
> > > > > davidmontgom...@gmail.com
> > > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am very concerned about using kafka in production given the
> below
> > > > > > errors:
> > > > > >
> > > > > > Now issues with myt zookeeper.  Other services use ZK.  Only
> kafka
> > > > fails.
> > > > > > I have 2 kafka servers using 8.x.  How do I resolve?  I tried
> > > > restarting
> > > > > > services for kafka.  Below is my kafka server.properties file
> > > > > >
> > > > > > 'Traceback (most recent call last):
> > > > > >   File
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "/usr/local/lib/python2.7/dist-packages/gevent-1.1b6-py2.7-linux-x86_64.egg/gevent/greenlet.py",
> > > > > > line 523, in run
> > > > > > result = self._run(*self.args, **self.kwargs)
> > > > > >   File "/var/feed-server/ad-server/pixel-server.py", line 145, in
> > > > > > send_kafka_message
> > > > > > res = producer.send_messages(topic, message)
> > > > > >   File "build/bdist.linux-x86_64/egg/kafka/producer/simple.py",
> > line
> > > > 52,
> > > > > in
> > > > > > send_messages
> > > > > > partition = self._next_partition(topic)
> > > > > >   File "build/bdist.linux-x86_64/egg/kafka/producer/simple.py",
> > line
> > > > 36,
> > > > > in
> > > > > > _next_partition
> > > > > > self.client.load_metadata_for_topics(topic)
> > > > > >   File "build/bdist.linux-x86_64/egg/kafka/client.py", line 383,
> in
> > > > > >