Thanks for the follow-up and the bug report, David.

We're taking a look at that.



On Mon, Oct 10, 2016 at 4:36 PM, David Garcia <dav...@spiceworks.com> wrote:

> Thx for the responses.  I was able to identify a bug in how the times are
> obtained (offsets resolved as unknown cause the issue):
>
> “Actually, I think the bug is more subtle.  What happens when a consumed
> topic stops receiving messages?  The smallest timestamp will always be the
> static timestamp of this topic.
>
> -David
>
> On 10/7/16, 5:03 PM, "David Garcia" <dav...@spiceworks.com> wrote:
>
>     Ok I found the bug.  Basically, if there is an empty topic (in the
> list of topics being consumed), any partition-group with partitions from
> the topic will always return -1 as the smallest timestamp (see
> PartitionGroup.java).
>
>     To reproduce, simply start a kstreams consumer with one or more empty
> topics.  Punctuate will never be called.
>
>     -David ”
>
> On 10/10/16, 1:55 AM, "Michael Noll" <mich...@confluent.io> wrote:
>
>     > We have run the application (and have confirmed data is being
> received)
>     for over 30 mins…with a 60-second timer.
>
>     Ok, so your app does receive data but punctuate() still isn't being
> called.
>     :-(
>
>
>     > So, do we need to just rebuild our cluster with bigger machines?
>
>     That's worth trying out.  See
>     http://www.confluent.io/blog/design-and-deployment-
> considerations-for-deploying-apache-kafka-on-aws/
>     for some EC2 instance types recommendations.
>
>     But I'd also suggest to look into the logs of (1) your application,
> (2) the
>     log files of the Kafka broker(s), and (3) the log files of ZooKeeper
> to see
>     whether you see anything suspicious?
>
>     Sorry for not being able to provide more actionable feedback at this
>     point.  Typically we have seen such issues only (but not exclusively)
> in
>     cases where there have been problems in the environment in which your
>     application is running and/or the environment of the Kafka clusters.
>     Unfortunately these environment problems are a bit tricky to debug
> remotely
>     via the mailing list.
>
>     -Michael
>
>
>
>
>
>     On Fri, Oct 7, 2016 at 8:11 PM, David Garcia <dav...@spiceworks.com>
> wrote:
>
>     > Yeah, this is possible.  We have run the application (and have
> confirmed
>     > data is being received) for over 30 mins…with a 60-second timer.
> So, do we
>     > need to just rebuild our cluster with bigger machines?
>     >
>     > -David
>     >
>     > On 10/7/16, 11:18 AM, "Michael Noll" <mich...@confluent.io> wrote:
>     >
>     >     David,
>     >
>     >     punctuate() is still data-driven at this point, even when you're
> using
>     > the
>     >     WallClock timestamp extractor.
>     >
>     >     To use an example: Imagine you have configured punctuate() to be
> run
>     > every
>     >     5 seconds.  If there's no data being received for a minute, then
>     > punctuate
>     >     won't be called -- even though you probably would have expected
> this to
>     >     happen 12 times during this 1 minute.
>     >
>     >     (FWIW, there's an ongoing discussion to improve punctuate(),
> part of
>     > which
>     >     is motivated by the current behavior that arguably is not very
>     > intuitive to
>     >     many users.)
>     >
>     >     Could this be the problem you're seeing?  See also the related
>     > discussion
>     >     at
>     >     http://stackoverflow.com/questions/39535201/kafka-problems-with-
>     > timestampextractor
>     >     .
>     >
>     >
>     >
>     >
>     >
>     >
>     >     On Fri, Oct 7, 2016 at 6:07 PM, David Garcia <
> dav...@spiceworks.com>
>     > wrote:
>     >
>     >     > Hello, I’m sure this question has been asked many times.
>     >     > We have a test-cluster (confluent 3.0.0 release) of 3 aws
>     > m4.xlarges.  We
>     >     > have an application that needs to use the punctuate() function
> to do
>     > some
>     >     > work on a regular interval.  We are using the WallClock
> extractor.
>     >     > Unfortunately, the method is never called.  I have checked the
>     >     > filedescriptor setting for both the user as well as the
> process, and
>     >     > everything seems to be fine.  Is this a known bug, or is there
>     > something
>     >     > obvious I’m missing?
>     >     >
>     >     > One note, the application used to work on this cluster, but
> now it’s
>     > not
>     >     > working.  Not really sure what is going on?
>     >     >
>     >     > -David
>     >     >
>     >
>     >
>     >
>
>
>

Reply via email to