stable release?

2016-01-04 Thread Jason Rosenberg
All, I see that 0.8.2.2 is still listed as the 'stable release', while 0.9.0.0 is the 'latest release', for kafka. At what point to we expect 0.9.X to become 'stable'? Will it be 0.9.0.1? Also, I assume more than a few of us have upgraded to 0.9.0.0 for production environments, any reports so

Re: are 0.8.2.1 and 0.9.0.0 compatible?

2015-10-01 Thread Jason Rosenberg
Of course, that documentation needs to be updated to refer to '0.9.X'! Also, I'm wondering if the last step there should be changed to remove the property altogether and restart (rather than setting it to the new version), since once the code is updated, it will use that by default? On Thu, Oct

Re: Log Cleaner Thread Stops

2015-09-23 Thread Jason Rosenberg
It looks like that fix will not be included in a release until 0.9.0.0. I'm thinking maybe it makes sense not to switch to kafka storage for offsets until then? Jason On Fri, Sep 18, 2015 at 1:25 PM, Todd Palino wrote: > I think the last major issue with log compaction

Re: 0.9.0.0 remaining jiras

2015-09-15 Thread Jason Rosenberg
/browse/KAFKA- (seems important to fix) Jason On Tue, Sep 15, 2015 at 3:28 PM, Jason Rosenberg <j...@squareup.com> wrote: > Yep, > > It looks like this was only communicated originally to the dev list (and > not the users list), so it wasn't obvious to all! > > Thanks, &

Re: 0.9.0.0 remaining jiras

2015-09-15 Thread Jason Rosenberg
> > http://mail-archives.apache.org/mod_mbox/kafka-dev/201509.mbox/%3CCAFc58G-UScVKrSF1kdsowQ8Y96OAaZEdiZsk40G8fwf7iToFaw%40mail.gmail.com%3E > > Kind regards, > Stevo Slavic. > > > On Mon, Sep 14, 2015 at 8:56 AM, Jason Rosenberg <j...@squareup.com> wrote: > > > Hi Jun

Re: 0.9.0.0 remaining jiras

2015-09-14 Thread Jason Rosenberg
Hi Jun, Can you clarify, will there not be a 0.8.3.0 (and instead we move straight to 0.9.0.0)? Also, can you outline the man new features/updates for 0.9.0.0? Thanks, Jason On Sat, Sep 12, 2015 at 12:40 PM, Jun Rao wrote: > The following is a candidate list of jiras that

Re: [ANNOUNCE] Burrow - Consumer Lag Monitoring as a Service

2015-06-09 Thread Jason Rosenberg
Hi Todd, Thanks for open sourcing this, I'm excited to take a look. It looks like it's specific to offsets stored in kafka (and not zookeeper) correct? I assume by that that LinkedIn is using the kafka storage now in production? Jason On Thu, Jun 4, 2015 at 9:43 PM, Todd Palino

Re: How to prevent custom Partitioner from increasing the number of producer's requests?

2015-06-04 Thread Jason Rosenberg
point about breaking batch into 2 separate partitions. With that code, I jump to a new partition on message 201, 401, 601, ... with batch size = 200, where is my mistake? Thanks for your help, Sébastien 2015-06-02 16:55 GMT+02:00 Jason Rosenberg j...@squareup.com: Hi Sebastien

Re: Consumer lag lies - orphaned offsets?

2015-06-04 Thread Jason Rosenberg
I assume you are looking at a 'MaxLag' metric, which reports the worst case lag over a set of partitions. Are you consuming multiple partitions, and maybe one of them is stuck? On Tue, Jun 2, 2015 at 4:00 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I've noticed that when we

Re: Issue with kafka-topics.sh tool for adding new partitions with replica assignment

2015-06-03 Thread Jason Rosenberg
Probably makes sense to file a Jira for this issue. On Mon, May 11, 2015 at 8:28 AM, Stefan Schadwinkel stefan.schadwin...@smaato.com wrote: Hi, with Kafka 0.8 it is possible to add new partitions on newly added brokers and supply a partition assignment to put the new partitions mainly on

Re: How to prevent custom Partitioner from increasing the number of producer's requests?

2015-06-02 Thread Jason Rosenberg
Hi Sebastien, You might just try using the default partitioner (which is random). It works by choosing a random partition each time it re-polls the meta-data for the topic. By default, this happens every 10 minutes for each topic you produce to (so it evenly distributes load at a granularity of

Re: Cascading failures on running out of disk space

2015-06-01 Thread Jason Rosenberg
Hi Jananee, Do you for sure that you ran out of disk space completely? Did you see an IOExceptions failing to write? Normally, when that happens, the broker is supposed to immediately shut itself down. Did the one broker shut itself down? The NotLeaderForPartitionException's are normal when

Re: Broker error: failed due to Leader not local for partition

2015-06-01 Thread Jason Rosenberg
failed due to leader not local for partition usually occurs in response to client requests that make a fetch or produce request to a partition, to the wrong broker (e.g. to a follower and not the leader for the partition). Clients need to make a meta-data request first to determine the leader

Re: Kafka broker - Ip-address instead of host naem

2015-06-01 Thread Jason Rosenberg
Daniel, Are you sure about that (it's not what I would have understood). Generally, the way to do it is use a round-robin dns entry, which returns successive nodes for successive requests. Kafka will retry a zookeeper request on failure (and in the process get re-routed to a new zk node). If a

Re: leader update partitions fail with KeeperErrorCode = BadVersion,kafka version=0.8.1.1

2015-06-01 Thread Jason Rosenberg
I've seen this problem now too with 0.8.2.1. It happened after we had a disk failure (but the server failed to shutdown: KAFKA-). After that happened, subsequently, several ISR's underwent I think 'unclean leader election', but I'm not 100% sure. But I did see lots of those same error

Re: aborting a repartition assignment

2015-06-01 Thread Jason Rosenberg
As Lance mentioned, the best course of action in such a case (since version 0.8.X) is to keep the failed broker down, and bring up a new node (with the same broker id as the failed broker), and it will automatically re-sync its replicas (which may take some time). You don't want to try to

Re: Ordered Message Queue with Pool of Consumers

2015-06-01 Thread Jason Rosenberg
How would you apply total ordering if multiple messages are being consumed in parallel? If message-1 and message-2 are being consumed in parallel, do you really mean you want to guarantee that message-1 is consumed before the consumption of message-2 begins? On Tue, May 26, 2015 at 1:34 PM,

Re: consumer poll returns no records unless called more than once, why?

2015-06-01 Thread Jason Rosenberg
Ben, It could also be related to how you initialize auto.offset.reset. In unit tests, you generally want to set it to 'smallest' to avoid race conditions between producing and consuming. Jason On Wed, May 20, 2015 at 2:32 PM, Padgett, Ben bpadg...@illumina.com wrote: Thanks for the

Re: KafkaException: Size of FileMessageSet has been truncated during write

2015-06-01 Thread Jason Rosenberg
Might be good to have a more friendly error message though! On Thu, May 28, 2015 at 4:32 PM, Andrey Yegorov andrey.yego...@gmail.com wrote: Thank you! -- Andrey Yegorov On Wed, May 27, 2015 at 4:42 PM, Jiangjie Qin j...@linkedin.com.invalid wrote: This should be just a message

Re: Kafka partitions unbalanced

2015-06-01 Thread Jason Rosenberg
Andrew Otto, This is a known problem (and which I have run into as well). Generally, my solution has been to increase the number of partitions such that the granularity of partitions is much higher than the number of disks, such that its more unlikely for the imbalance to be significant. I

Re: Offset management: client vs broker side responsibility

2015-06-01 Thread Jason Rosenberg
Stevo, Both of the main solutions used by the high-level consumer are standardized and supported directly by the kafka client libraries (e.g. maintaining offsets in zookeeper or in kafka itself). And for the zk case, there is the consumer offset checker (which is good for monitoring). Consumer

Re: How to achieve distributed processing and high availability simultaneously in Kafka?

2015-05-06 Thread Jason Rosenberg
A consumer thread can consume multiple partitions. This is not unusual, in practice. In the example you gave, if multiple high-level consumers are using the same group id, they will automatically rebalance the partition assignment between them as consumers dynamically join and leave the group.

Re: circuit breaker for producer

2015-05-05 Thread Jason Rosenberg
Guozhang, Do you have the ticket number for possibly adding in local log file failover? Is it actively being worked on? Thanks, Jason On Tue, May 5, 2015 at 6:11 PM, Guozhang Wang wangg...@gmail.com wrote: Does this log file acts as a temporary disk buffer when broker slows down, whose data

Re: 'roundrobin' partition assignment strategy restrictions

2015-05-05 Thread Jason Rosenberg
retry should handle it. But if you want to canary a new topic setting on one consumer for some time, it won’t work. Could you maybe share the use case with more detail? So we can see if there is any workaround. Jiangjie (Becket) Qin On 3/22/15, 10:04 AM, Jason Rosenberg j...@squareup.com

Re: Round Robin Partition Assignment

2015-05-05 Thread Jason Rosenberg
I asked about this same issue in a previous thread. Thanks for reminding me, I've added this Jira: https://issues.apache.org/jira/browse/KAFKA-2172 I think this is a great new feature, but it's unfortunately the all consumers must be the same is just a bit too restrictive. Jason On Tue, May

expected behavior if a node undergoes unclean shutdown

2015-04-08 Thread Jason Rosenberg
Hello, I'm still trying to get to the bottom of an issue we had previously, with an unclean shutdown during an upgrade to 0.8.2.1 (from 0.8.1.1). In that case, the controlled shutdown was interrupted, and the node was shutdown abruptly. This resulted in about 5 minutes of unavailability for

Re: expected behavior if a node undergoes unclean shutdown

2015-04-08 Thread Jason Rosenberg
I've confirmed that the same thing happens even if it's not the controller that's killed hard. Also, in several trials, it took between 10-30 seconds to recover. Jason On Wed, Apr 8, 2015 at 1:31 PM, Jason Rosenberg j...@squareup.com wrote: Hello, I'm still trying to get to the bottom

Re: Problem with node after restart no partitions?

2015-04-07 Thread Jason Rosenberg
, Thunder -Original Message- From: Jason Rosenberg [mailto:j...@squareup.com] Sent: Friday, April 03, 2015 10:50 AM To: users@kafka.apache.org Subject: Re: Problem with node after restart no partitions? I will provide what I can (we don't have separate logs for controller, etc

Re: Problem with node after restart no partitions?

2015-04-03 Thread Jason Rosenberg
issueŠ Could you provide the controller log and the log for the first broker on which you tried controlled shutdown and upgrade? On 4/3/15, 8:57 AM, Jason Rosenberg j...@squareup.com wrote: I'm preparing a longer post here, but we recently ran into a similar scenario. Not sure yet if it's

Re: How to consume from a specific topic, as well as a wildcard of topics?

2015-04-03 Thread Jason Rosenberg
Yeah, I think you need to have 2 consumer connectors (I routinely have multiple consumer connectors co-existing in the same app). That error message about the ephemeral node is really annoying, by the way. It happens under lots of scenarios (at least it did under 0.8.1.1), where it simply never

Re: Which version works for kafka 0.8.2 as consumer?

2015-04-03 Thread Jason Rosenberg
Is there a reason the incomplete version was included in the 0.8.2.1 release? On Wed, Apr 1, 2015 at 1:02 PM, Mayuresh Gharat gharatmayures...@gmail.com wrote: What do you mean by offset syncing? Thanks, Mayuresh On Wed, Apr 1, 2015 at 9:59 AM, pushkar priyadarshi

Re: New kafka client for Go (golang)

2015-04-03 Thread Jason Rosenberg
How does it compare to Sarama? On Mon, Mar 30, 2015 at 3:09 PM, Piotr Husiatyński p...@optiopay.com wrote: Hi, I wanted to share new client library for Go language that I'm developing at Optiopay. Library is MIT licensed and provides API close to latest kafka request/response wire format

Re: Side-by-side migration for 0.7 to 0.8

2015-04-03 Thread Jason Rosenberg
Hi Patrick, When we went through this, we 'shaded' the old kafka jar, so the 2 could co-exist in the same app. We use maven, and there's a maven 'shade plugin', etc. In our case, it was intractable to try to update all producers and consumers in one go as you suggest, so we had to have a way in

Re: Problem with node after restart no partitions?

2015-04-03 Thread Jason Rosenberg
I'm preparing a longer post here, but we recently ran into a similar scenario. Not sure yet if it's the same thing you saw (but it feels similar). We were also doing a rolling upgrade from 0.8.1.1 to 0.8.2.1, and during the controlled shutdown of the first node (of a 4 node cluster), the

Re: Anyone interested in speaking at Bay Area Kafka meetup @ LinkedIn on March 24?

2015-03-23 Thread Jason Rosenberg
Hi Jon, It the link for the 1/27 meetup you posted works for me, but I haven't found how to find that same link on the meetup site (there are links that point to the live stream, which of course is no longer happening!). Thoughts? Thanks, Jason On Mon, Mar 2, 2015 at 11:31 AM, Jon Bringhurst

how to remove consumer group.id with storage=kafka

2015-02-26 Thread Jason Rosenberg
All, There exists code in the sample console consumer that ships with kafka, that will remove consumer group id's from zookeeper, for the case where it's just a short-lived session using an auto-generated groupid. It's a bit of a hack, but it works (keeps the number of groupids from

Re: question about new consumer offset management in 0.8.2

2015-02-05 Thread Jason Rosenberg
Thanks, Joel On Thu, Feb 5, 2015 at 2:21 PM, Joel Koshy jjkosh...@gmail.com wrote: This is documented in the official docs: http://kafka.apache.org/documentation.html#distributionimpl On Thu, Feb 05, 2015 at 01:23:01PM -0500, Jason Rosenberg wrote: What are the defaults

Re: question about new consumer offset management in 0.8.2

2015-02-05 Thread Jason Rosenberg
for a given consumer group won't be lost? Jason On Thu, Feb 5, 2015 at 2:21 PM, Joel Koshy jjkosh...@gmail.com wrote: This is documented in the official docs: http://kafka.apache.org/documentation.html#distributionimpl On Thu, Feb 05, 2015 at 01:23:01PM -0500, Jason Rosenberg wrote: What

question about new consumer offset management in 0.8.2

2015-02-05 Thread Jason Rosenberg
Hi, For 0.8.2, one of the features listed is: - Kafka-based offset storage. Is there documentation on this (I've heard discussion of it of course)? Also, is it something that will be used by existing consumers when they migrate up to 0.8.2? What is the migration process? Thanks, Jason

Re: question about new consumer offset management in 0.8.2

2015-02-05 Thread Jason Rosenberg
(Commit offsets to Zookeeper and Kafka). 3) Set dual.commit.enabled=false and offsets.storage=kafka and restart (Commit offsets to Kafka only). -Jon On Feb 5, 2015, at 9:03 AM, Jason Rosenberg j...@squareup.com wrote: Hi, For 0.8.2, one of the features listed is: - Kafka

Re: Poll: Producer/Consumer impl/language you use?

2015-01-28 Thread Jason Rosenberg
I think the results could be a bit skewed, in cases where an organization uses multiple languages, but not equally. In our case, we overwhelmingly use java clients (90%). But we also have ruby and Go clients too. But in the poll, these come out as equally used client languages. Jason On Wed,

Re: Missing Per-Topic BrokerTopicMetrics in v0.8.2.0

2015-01-27 Thread Jason Rosenberg
previous email, I thought that only the per-topic BrokerTopicMetrics were missing, but also several other per-topic metrics are missing too, e.g. under kafka.log, etc. Jason ​ On Tue, Jan 27, 2015 at 2:20 AM, Jason Rosenberg j...@squareup.com wrote: I can confirm that the per topic metrics

Re: Missing Per-Topic BrokerTopicMetrics in v0.8.2.0

2015-01-27 Thread Jason Rosenberg
= 30.08 bytes/s 15-minute rate = 50.27 bytes/s Manikumar On Tue, Jan 27, 2015 at 1:59 PM, Jason Rosenberg j...@squareup.com wrote: Ok, It looks like the yammer MetricName is not being created correctly for the sub metrics that include a topic. E.g. a metric with an mbeanName like

Re: Missing Per-Topic BrokerTopicMetrics in v0.8.2.0

2015-01-27 Thread Jason Rosenberg
-1481. Topic name (and clientId, etc) can have dash in it and it's hard to parse. Thanks, Jun On Tue, Jan 27, 2015 at 6:30 AM, Jason Rosenberg j...@squareup.com wrote: Remember multiple people have reported this issue. Per topic metrics no longer appear in graphite (or in any

Re: Regarding Kafka release 0.8.2-beta

2015-01-26 Thread Jason Rosenberg
shouldn't the new consumer api be removed from the 0.8.2 code base then? On Fri, Jan 23, 2015 at 10:30 AM, Joe Stein joe.st...@stealth.ly wrote: The new consumer is scheduled for 0.9.0. Currently Kafka release candidate 2 for 0.8.2.0 is being voted on. There is an in progress patch to the

Re: unable to delete topic with 0.8.2 rc2

2015-01-26 Thread Jason Rosenberg
. So if you issued a delete topic command and you have producers running or consumers? too which is issuing a TopicMetadataRequest than the topic will be recreated. -Harsha On Sun, Jan 25, 2015, at 11:26 PM, Jason Rosenberg wrote: cversion did change (incremented by 2) when I issue

Re: Missing Per-Topic BrokerTopicMetrics in v0.8.2.0

2015-01-26 Thread Jason Rosenberg
I can confirm that the per topic metrics are not coming through to the yammer metrics registry. I do see them in jmx (via jconsole), but the MetricsRegistry does not have them. All the other metrics are coming through that appear in jmx. This is with single node instance running locally. Jason

Re: unable to delete topic with 0.8.2 rc2

2015-01-25 Thread Jason Rosenberg
yes On Mon, Jan 26, 2015 at 12:18 AM, Jun Rao j...@confluent.io wrote: Do you have delete.topic.enable turned on in all brokers? Thanks, Jun On Sun, Jan 25, 2015 at 7:56 PM, Jason Rosenberg j...@squareup.com wrote: So far, I have been unable to get delete topic to work, with release

Re: warning on startup of consumer app with 0.8.2 rc2

2015-01-25 Thread Jason Rosenberg
. Thanks, Jun On Sun, Jan 25, 2015 at 7:05 PM, Jason Rosenberg j...@squareup.com wrote: I don't think it's really unusual for deployment environments to produce single shaded jars for an app. Thus, I'm wondering if we can't rethink this here? E.g. just have a constant in code which states

Re: warning on startup of consumer app with 0.8.2 rc2

2015-01-25 Thread Jason Rosenberg
On Fri, Jan 23, 2015 at 5:24 PM, Jun Rao j...@confluent.io wrote: The only impact is that you don't get the mbean that tells you the version of the jar. That's why it's just a warning. Thanks, Jun On Fri, Jan 23, 2015 at 1:04 PM, Jason Rosenberg j...@squareup.com wrote: What

Re: warning on startup of consumer app with 0.8.2 rc2

2015-01-23 Thread Jason Rosenberg
-console-consumer in 0.8.2 rc2 is running fine. Do you have multiple kafka jars in your classpath? Thanks, Jun On Thu, Jan 22, 2015 at 4:58 PM, Jason Rosenberg j...@squareup.com wrote: 2015-01-23 00:55:25,273 WARN [async-message-sender-0] common.AppInfo$ - Can't read Kafka version from

Re: warning on startup of consumer app with 0.8.2 rc2

2015-01-23 Thread Jason Rosenberg
the following in the repacked jar that was included in the original Kafka jar. Our code looks for version info from there. META-INF/ META-INF/MANIFEST.MF Thanks, Jun On Fri, Jan 23, 2015 at 11:59 AM, Jason Rosenberg j...@squareup.com wrote: In this case, we have a single shaded jar

warning on startup of consumer app with 0.8.2 rc2

2015-01-22 Thread Jason Rosenberg
2015-01-23 00:55:25,273 WARN [async-message-sender-0] common.AppInfo$ - Can't read Kafka version from MANIFEST.MF. Possible cause: java.lang.NullPointerException

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Jason Rosenberg
In our case, we use protocol buffers for all messages, and these have simple serialization/deserialization builtin to the protobuf libraries (e.g. MyProtobufMessage.toByteArray()). Also, we often produce/consume messages without conversion to/from protobuf Objects (e.g. in cases where we are just

Re: Re: How to push metrics to graphite - jmxtrans does not work

2014-12-02 Thread Jason Rosenberg
fwiw, we wrap the kafka server in our java service container framework. This allows us to use the default GraphiteReporter class that is part of the yammer metrics library (which is used by kafka directly). So it works seemlessly. (We've since changed our use of GraphiteReporter to instead send

Re: Kafka restart takes a long time

2014-11-23 Thread Jason Rosenberg
Rajiv, So, any time a broker's disk fills up, it will shut itself down immediately (it will do this in response to any IO error on writing to disk). Unfortunately, this means that the node will not be able to do any housecleaning before shutdown, which is an 'unclean' shutdown. This means that

new producer api and batched Futures....

2014-11-20 Thread Jason Rosenberg
I've been looking at the new producer api with anticipation, but have not fired it up yet. One question I have, is it looks like there's no longer a 'batch' send mode (and I get that this is all now handled internally, e.g. you send individual messages, that then get collated and batched up and

Re: new producer api and batched Futures....

2014-11-20 Thread Jason Rosenberg
the new producer isn't as fast as the old producer let us know. -Jay On Thu, Nov 20, 2014 at 4:24 PM, Jason Rosenberg j...@squareup.com wrote: I've been looking at the new producer api with anticipation, but have not fired it up yet. One question I have, is it looks like there's

Re: ISR shrink to 0?

2014-11-19 Thread Jason Rosenberg
, Jun On Tue, Nov 18, 2014 at 10:36 PM, Jason Rosenberg j...@squareup.com wrote: Ok, Makes sense. But if the node is not actually healthy (and underwent a hard crash) it would likely not be able to avoid an 'unclean' restart.what happens if unclean leader election is disabled

Re: ISR shrink to 0?

2014-11-18 Thread Jason Rosenberg
to it)... Jason On Mon, Nov 17, 2014 at 2:06 PM, Jason Rosenberg j...@squareup.com wrote: We have had 2 nodes in a 4 node cluster die this weekend, sadly. Fortunately there was no critical data on these machines yet. The cluster is running 0.8.1.1, and using replication factor of 2 for 2 topics, each

Re: selecting java producer (0.8.2 or 0.8.1.1?)

2014-11-18 Thread Jason Rosenberg
Hi Jun, Is this the official java doc for the new producer (www.trieuvan.com)? I'm not seeing any links to it (or any documentation) on the apache kafka site (am I overlooking it)? Should there be a link to it in the 0.8.2-beta documentation page? Jason On Tue, Nov 18, 2014 at 7:23 PM, Jun

Re: selecting java producer (0.8.2 or 0.8.1.1?)

2014-11-18 Thread Jason Rosenberg
: @allthingshadoop / On Nov 18, 2014 10:33 PM, Jason Rosenberg j...@squareup.com wrote: Hi Jun, Is this the official java doc for the new producer (www.trieuvan.com)? I'm not seeing any links to it (or any documentation) on the apache kafka site (am I

Re: ISR shrink to 0?

2014-11-18 Thread Jason Rosenberg
Rao jun...@gmail.com wrote: Yes, we will preserve the last replica in ISR. This way, we know which replica has all committed messages and can wait for it to come back as the leader, if unclean leader election is disabled. Thanks, Jun On Mon, Nov 17, 2014 at 11:06 AM, Jason Rosenberg j

Re: HL publishing, retries and potential race condition

2014-11-16 Thread Jason Rosenberg
This has apparently been fixed in 0.8.2: https://issues.apache.org/jira/browse/KAFKA-899 On Mon, Oct 6, 2014 at 3:02 PM, Jun Rao jun...@gmail.com wrote: Yes, transient error like LeaderNotAvailableException can happen. If you configure enough retries, then you shouldn't see the exception in

Re: OffsetOutOfRange errors

2014-11-07 Thread Jason Rosenberg
The bottom line, is you are likely not consuming messages fast enough, so you are falling behind. So, you are steadily consuming older and older messages, and eventually you are consuming messages older than the retention time window set for your kafka broker. That's the typical scenario for

corrupt recovery checkpoint file issue....

2014-11-06 Thread Jason Rosenberg
Hi, We recently had a kafka node go down suddenly. When it came back up, it apparently had a corrupt recovery file, and refused to startup: 2014-11-06 08:17:19,299 WARN [main] server.KafkaServer - Error starting up KafkaServer java.lang.NumberFormatException: For input string:

Re: corrupt recovery checkpoint file issue....

2014-11-06 Thread Jason Rosenberg
forgot to mention, we are using 0.8.1.1 Jason On Thu, Nov 6, 2014 at 9:31 AM, Jason Rosenberg j...@squareup.com wrote: Hi, We recently had a kafka node go down suddenly. When it came back up, it apparently had a corrupt recovery file, and refused to startup: 2014-11-06 08:17:19,299

Re: zookeeper upgrade or remove zookeeper dependency

2014-11-06 Thread Jason Rosenberg
We have been using zk 3.4.6 (and we use curator), without any problems with kafka, for quite a while now Jason On Thu, Sep 18, 2014 at 2:18 PM, Mingtao Zhang mail2ming...@gmail.com wrote: Great :) Best Regards, Mingtao On Thu, Sep 18, 2014 at 2:04 PM, Guozhang Wang wangg...@gmail.com

Re: Disactivating Yammer Metrics Monitoring

2014-11-06 Thread Jason Rosenberg
Hi Francois, We had the exact same problem. We embed Kafka in our service container, and we use yammer metrics to see data about the whole app (e.g. kafka, the jvm, the service container wrapping it). However, as you observed, by default, kafka produces an insane amount of metrics. So what we

Re: Interaction of retention settings for broker and topic plus partitions

2014-11-06 Thread Jason Rosenberg
Jun, To clarify though, is it correct that a per topic limit will always override the default limit of the same type? (e.g. a large per-topic retention hours vs. a small default retention hours)? Jason On Sat, Sep 20, 2014 at 12:28 AM, Jun Rao jun...@gmail.com wrote: That's right. The rule

Re: corrupt recovery checkpoint file issue....

2014-11-06 Thread Jason Rosenberg
file to the final file. This is done to prevent corruption caused by a crash in the middle of the writes. In your case, was the host crashed? What kind of storage system are you using? Is there any non-volatile cache on the storage system? Thanks, Jun On Thu, Nov 6, 2014 at 6:31 AM, Jason

Re: corrupt recovery checkpoint file issue....

2014-11-06 Thread Jason Rosenberg
filed: https://issues.apache.org/jira/browse/KAFKA-1758 On Thu, Nov 6, 2014 at 11:50 PM, Jason Rosenberg j...@squareup.com wrote: I'm still not sure what caused the reboot of the system (but yes it appears to have crashed hard). The file system is xfs, on CentOs linux. I'm not yet sure

Re: [ANNOUNCEMENT] Apache Kafka 0.8.2-beta Released

2014-11-03 Thread Jason Rosenberg
Are there any config parameter updates/changes? I see the doc here: http://kafka.apache.org/documentation.html#configuration now defaults to 0.8.2-beta. But it would be useful to know if anything has changed from 0.8.1.1, just so we can be sure to update things, etc. On Sat, Nov 1, 2014 at

Re: [ANNOUNCEMENT] Apache Kafka 0.8.2-beta Released

2014-11-03 Thread Jason Rosenberg
Also, that doc refers to the 'new producer' as available in trunk and of beta quality. But from the announcement, it seems it's now more properly integrated in the release? Also, where can I read about the 'kafka-client' referred to above? Thanks, Jason On Mon, Nov 3, 2014 at 4:46 PM, Jason

consumer rebalance weirdness

2014-08-07 Thread Jason Rosenberg
We've noticed that some of our consumers are more likely to repeatedly trigger rebalancing when the app is consuming messages more slowly (e.g. persisting data to back-end systems, etc.). If on the other hand we 'fast-forward' the consumer (which essentially means we tell it to consume but do

Re: consumer rebalance weirdness

2014-08-07 Thread Jason Rosenberg
application possibly timing out its zookeeper connection during consumption while doing its processing, thus triggering the rebalance? -Clark On 8/6/14, 11:18 PM, Jason Rosenberg j...@squareup.com wrote: We've noticed that some of our consumers are more likely to repeatedly trigger rebalancing when

Re: delete topic ?

2014-08-07 Thread Jason Rosenberg
Since the deletion stuff is now in trunk, would be compatible to issue the command from a jar built from trunk, against a running 0.8.1.1 cluster? Or does the cluster also have to be running trunk? (I'm guessing it does :)). I have some topics I'd like to delete, but don't want to wait for

Re: consumer rebalance weirdness

2014-08-07 Thread Jason Rosenberg
it not to check in with ZK for longer than the timeout. - http://www.philipotoole.com On Thursday, August 7, 2014 8:16 AM, Jason Rosenberg j...@squareup.com wrote: Well, it's possible that when processing, it might take longer than the zookeeper

Re: much reduced io utilization after upgrade to 0.8.0 - 0.8.1.1

2014-07-23 Thread Jason Rosenberg
as it will schedule them in an order friendly to the layout on disk and do a good job of merging adjacent writes. However if you are explicitly configuring an fsync policy (either by time or number of messages) then this is likely not the cause. -Jay On Tue, Jul 22, 2014 at 9:37 PM, Jason

much reduced io utilization after upgrade to 0.8.0 - 0.8.1.1

2014-07-22 Thread Jason Rosenberg
I recently upgraded some of our kafka clusters to use 0.8.1.1 (from 0.8.0). It's all looking good so far. One thing I notice though (seems like a good thing) is that the iostat utilization has gone way down after the upgrade. I'm not sure if I know exactly what could could be responsible for

Re: status of 0.8.2

2014-07-08 Thread Jason Rosenberg
for trunk. Thanks, Jun On Mon, Jul 7, 2014 at 7:31 AM, Jason Rosenberg j...@squareup.com wrote: What's the status for an 0.8.2 release? We are currently using 0.8.0, and would like to upgrade to take advantage of some of the per-topic retention options available now in 0.8.1

Re: status of 0.8.2

2014-07-08 Thread Jason Rosenberg
Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Tue, Jul 8, 2014 at 10:45 AM, Jason Rosenberg j...@squareup.com wrote: Is there a blocker to getting the patch

Re: kafka 0.8.1.1 log.retention.minutes NOT being honored

2014-07-08 Thread Jason Rosenberg
using either granularity. Guozhang On Tue, Jul 8, 2014 at 1:11 PM, Jason Rosenberg j...@squareup.com wrote: On a related note, in doing the upgrade from 0.8.0, I noticed that the config property changed from 'log.retention.hours' to 'log.retention.minutes'. Would it have made more sense

Re: quick question about new consumer api

2014-07-07 Thread Jason Rosenberg
require single-partition topics? Guozhang On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg j...@squareup.com wrote: I've been looking at the new consumer api outlined here: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design One issue in the current

Re: quick question about new consumer api

2014-07-07 Thread Jason Rosenberg
. Guozhang On Mon, Jul 7, 2014 at 8:44 AM, Jason Rosenberg j...@squareup.com wrote: Guozhang, I'm not suggesting we parallelize within a partition The problem with the current high-level consumer is, if you use a regex to select multiple topics, and then have multiple consumers

Re: 0.7 - 0.8 Protocol Upgrade in production environments

2014-01-21 Thread Jason Rosenberg
In my case, we just rolled out a separate 0.8 cluster, and migrated producers to it over time (took several weeks to get everything updated to the new cluster). In the transition, we had consumers running for both clusters. Once no traffic was flowing on the old cluster, we then shut down the

Re: Patterns for message failure handling with Kafka

2014-01-21 Thread Jason Rosenberg
So, I think there are 2 different types of errors you mention. The first is data-dependent (e.g. it's corrupt or some such). So, there's no reason to block consumption of other messages that are likely to be successful, while the data-dependent one won't fix itself no matter times you retry.

Re: log.retention.bytes.per.topic does not trigger deletion

2014-01-19 Thread Jason Rosenberg
Please be sure to update the online config docs with this change! The per topic options are still listed there Jason On Thu, Jan 16, 2014 at 9:57 PM, Ben Summer bsum...@gnipcentral.com wrote: I see. I don't have version 0.8.1 yet. We just updated to 0.8.0 from beta after it became the

Re: Consumers can't connect while broker is under load

2014-01-14 Thread Jason Rosenberg
12 zookeepers seems like a lot..and you should always, by default, prefer an odd number of zookeepers. Consumers negotiate with each other for partition ownership, via zookeeper. Jason On Fri, Jan 10, 2014 at 9:20 PM, Guozhang Wang wangg...@gmail.com wrote: Can you post the consumer

Re: understanding OffsetOutOfRangeException's....

2014-01-12 Thread Jason Rosenberg
Not sure, but I'll try (it's a bit difficult to create a test-case, because it requires a good bit of integration testing, etc.). Jason On Sat, Jan 11, 2014 at 12:06 AM, Jun Rao jun...@gmail.com wrote: Do you think you can reproduce this easily? Thanks, Jun

Re: understanding OffsetOutOfRangeException's....

2014-01-10 Thread Jason Rosenberg
10, 2014 at 11:06 AM, Jun Rao jun...@gmail.com wrote: Could you increase parallelism on the consumers? Thanks, Jun On Thu, Jan 9, 2014 at 1:22 PM, Jason Rosenberg j...@squareup.com wrote: The consumption rate is a little better after the refactoring. The main issue though, was that we

Re: understanding OffsetOutOfRangeException's....

2014-01-09 Thread Jason Rosenberg
the refactoring? Thanks, Jun On Wed, Jan 8, 2014 at 10:44 AM, Jason Rosenberg j...@squareup.com wrote: Yes, it's happening continuously, at the moment (although I'm expecting the consumer to catch up soon) It seemed to start happening after I refactored the consumer app to use

Re: understanding OffsetOutOfRangeException's....

2014-01-08 Thread Jason Rosenberg
warning. The offset mismatch error should never happen. It could be that OffsetOutOfRangeException exposed a bug. Do you think you can reproduce this easily? Thanks, Jun On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg j...@squareup.com wrote: Jun, I'm not sure I understand your

Re: understanding OffsetOutOfRangeException's....

2014-01-08 Thread Jason Rosenberg
, Jan 8, 2014 at 1:44 PM, Jason Rosenberg j...@squareup.com wrote: Yes, it's happening continuously, at the moment (although I'm expecting the consumer to catch up soon) It seemed to start happening after I refactored the consumer app to use multiple consumer connectors in the same

Re: understanding OffsetOutOfRangeException's....

2014-01-07 Thread Jason Rosenberg
data if I see the second ERROR log line? Jason On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg j...@squareup.com wrote: But I assume this would not be normally you'd want to log (every incoming producer request?). Maybe just for debugging? Or is it only for consumer fetch requests? On Tue

Re: understanding OffsetOutOfRangeException's....

2014-01-07 Thread Jason Rosenberg
...@gmail.com wrote: The WARN and ERROR may not be completely correlated. Could it be that the consumer is slow and couldn't keep up with the produced data? Thanks, Jun On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg j...@squareup.com wrote: So, sometimes I just get the WARN from

Re: problem with high-level consumer stream filter regex....

2014-01-03 Thread Jason Rosenberg
Thanks Joe, I can confirm that your patch works for me, as applied to 0.8.0. Jason On Fri, Dec 20, 2013 at 6:28 PM, Jason Rosenberg j...@squareup.com wrote: Thanks Joe, I generally build locally, and upload to our maven proxy (using a custom pom). I haven't yet had luck using maven

Re: which zookeeper version

2014-01-02 Thread Jason Rosenberg
Hi Pushkar, We've been using zk 3.4.5 for several months now, without any problems, in production. Jason On Thu, Jan 2, 2014 at 1:15 AM, pushkar priyadarshi priyadarshi.push...@gmail.com wrote: Hi, I am starting a fresh deployment of kafka + zookeeper.Looking at zookeeper releases find

Re: Understanding the min fetch rate metric

2013-12-26 Thread Jason Rosenberg
topics whose leader is on that broker. We have seen a fetcher being killed by a bug in Kafka. Also, if the broker is slow (e.g. due to I/O contention), the fetch rate could also be slower than expected. Thanks, Jun On Tue, Dec 24, 2013 at 12:48 PM, Jason Rosenberg j...@squareup.com wrote

Re: Understanding the min fetch rate metric

2013-12-24 Thread Jason Rosenberg
updated http://kafka.apache.org/documentation.html#monitoring Thanks, Jun On Mon, Dec 23, 2013 at 10:51 PM, Jason Rosenberg j...@squareup.com wrote: I'm realizing I'm not quite sure what the 'min fetch rate' metrics is indicating, for consumers. Can someone offer an explanation

  1   2   3   >