Gouzhang, Will do, if it gets stuck in this loop again ill inspect the broker log dirs. I'm running 0.11 release right now.
On Mon, Aug 14, 2017 at 4:22 PM, Guozhang Wang <wangg...@gmail.com> wrote: > Garrett, > > What I get confused is that you mentioned it start spamming the logs, means > that it falls into this endless loop of: > > 1) getting out-of-range exception > 2) resetting offset by querying the broker of the offset > 3) getting offset 0 from the broker, > 4) send fetching request with starting 0, getting out-of-range exception > again. > > So I'd like to ask you do me a favor if you see this again in your test > environment: goes into the broker log directory, and see if there are still > any log segment files for partition foo-0, and if yes, does that segment > file contain any data (using the DumpLogSegment tool) with what offset > ranges. > From the code path the broker should at least maintain one empty segment > even if all data gets truncated, in trunk, but I'm not sure if you are > running on an older version that may have some bug on the broker logs. > > Guozhang > > > > > On Mon, Aug 14, 2017 at 6:28 AM, Garrett Barton <garrett.bar...@gmail.com> > wrote: > > > Gouzhang, > > Thanks for the reply! Based on what you said I am going to increase > the > > log.retention.hours a bunch and see what happens, things typically break > > long before 48 hours, but your right the data could have expired by then > > too. I'll pay attention to that as well. > > > > As far as messing with the offsets I do nothing external to reset > offsets, > > streams is managing things itself. This really seems to only happen when > > streams does not have data to process, in the real system that is fed > data > > all the time I don't have this issue. > > > > > > On Sun, Aug 13, 2017 at 8:46 PM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > > > Hi Garrett, > > > > > > Since your error message says "offset X" is out of range, it means that > > the > > > offset was reset to because there was no data any more on topic > partition > > > "foo-0". I suspect that is because all the log segments got truncated > and > > > the topic partition contains empty list. It is less likely caused by > > > KAFKA-5510 and hence offsets.retention.minutes may not help here. > > > > > > Since you mentioned setting log.retention.hours=48 does not help, and > > that > > > the input sample data may be a day or two before the new build goes > out, > > I > > > suspect there may be some messages with timestamps older than 48 hours > > > published to the log, causing it to roll new segments and get deleted > > > immediately: note that the Kafka brokers use the current system time to > > > determine the diffs with the message timestamps. If that is the case it > > is > > > not a Streams issue, not even a general Consumer issue, but a Kafka > > broker > > > side log retention operation. > > > > > > What I'm not clear is that in your error message "X" is actually 0: > this > > is > > > quite weird that a consumer may auto-reset its position to 0, did you > run > > > some tools periodically to reset the offset to 0? > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > On Wed, Aug 9, 2017 at 7:16 AM, Garrett Barton < > garrett.bar...@gmail.com > > > > > > wrote: > > > > > > > I have a small test setup with a local zk/kafka server and a streams > > app > > > > that loads sample data. The test setup is usually up for a day or > two > > > > before a new build goes out and its blown away and loaded from > scratch. > > > > > > > > Lately I've seen that after a few hours the stream app will stop > > > processing > > > > and start spamming the logs with: > > > > > > > > org.apache.kafka.clients.consumer.internals.Fetcher: Fetch Offset 0 > is > > > out > > > > of range for partition foo-0, resetting offset > > > > org.apache.kafka.clients.consumer.internals.Fetcher: Fetch Offset 0 > is > > > out > > > > of range for partition foo-0, resetting offset > > > > org.apache.kafka.clients.consumer.internals.Fetcher: Fetch Offset 0 > is > > > out > > > > of range for partition foo-0, resetting offset > > > > > > > > Pretty much sinks a core into spamming the logs. > > > > > > > > Restarting the application puts it right back in that broke state. > > > > > > > > I thought it was because of this: > > > > https://issues.apache.org/jira/browse/KAFKA-5510 > > > > So I set my log.retention.hours=48, and > offsets.retention.minutes=1008 > > 1, > > > > which is huge compared to the total data retention time. Yet same > > error > > > > occurred. > > > > > > > > Any ideas? > > > > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > > > > > -- > -- Guozhang >