The WARN and ERROR may not be completely correlated. Could it be that the consumer is slow and couldn't keep up with the produced data?
Thanks, Jun On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg <j...@squareup.com> wrote: > So, sometimes I just get the WARN from the ConsumerFetcherThread (as > previously noted, above), e.g.: > > 2014-01-08 02:31:47,394 WARN [ConsumerFetcherThread-myconsumerapp-11] > consumer.ConsumerFetcherThread - > [ConsumerFetcherThread-myconsumerapp-11], Current offset 16163904970 > for partition [mypartition,0] out of range; reset offset to > 16175326044 > > More recently, I see these in the following log line (not sure why I > didn't see it previously), coming from the ConsumerIterator: > > 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0] > consumer.ConsumerIterator - consumed offset: 16163904970 doesn't match > fetch offset: 16175326044 for mytopic:0: fetched offset = 16175330598: > consumed offset = 16163904970; > Consumer may lose data > > Why would I not see this second ERROR everytime there's a > corresponding WARN on the FetcherThread for an offset reset? > > Should I only be concerned about possible lost data if I see the > second ERROR log line? > > Jason > > On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg <j...@squareup.com> wrote: > > But I assume this would not be normally you'd want to log (every > > incoming producer request?). Maybe just for debugging? Or is it only > > for consumer fetch requests? > > > > On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang <wangg...@gmail.com> > wrote: > >> TRACE is lower than INFO so INFO level request logging would also be > >> recorded. > >> > >> You can check for "Completed XXX request" in the log files to check the > >> request info with the correlation id. > >> > >> Guozhang > >> > >> > >> On Mon, Dec 23, 2013 at 10:46 PM, Jason Rosenberg <j...@squareup.com> > wrote: > >> > >>> Hmmm, it looks like I'm enabling all logging at INFO, and the request > >>> logging is only done at TRACE (why is that?). > >>> > >>> I suppose one wouldn't normally want to see request logs, so by > default, > >>> they aren't enabled? > >>> > >>> > >>> On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao <jun...@gmail.com> wrote: > >>> > >>> > Did you enable request log? It logs the ip of every request. > >>> > > >>> > Thanks, > >>> > > >>> > Jun > >>> > > >>> > > >>> > On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg <j...@squareup.com> > >>> wrote: > >>> > > >>> > > Hi Guozhang, > >>> > > > >>> > > I'm not sure I understand your first answer. I don't see anything > >>> > > regarding the correlation id, elsewhere in the broker logs.....They > >>> only > >>> > > show up in those ERROR messages.... > >>> > > > >>> > > I do see correlation id's in clients, but not on the broker..... > >>> > > > >>> > > Jason > >>> > > > >>> > > > >>> > > On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang <wangg...@gmail.com > > > >>> > wrote: > >>> > > > >>> > > > Jason, > >>> > > > > >>> > > > You can search the correlation id in the public access log on the > >>> > servers > >>> > > > to get the consumer information. > >>> > > > > >>> > > > As for logging, I agree that we should use the same level on both > >>> > sides. > >>> > > > Could you file a jira for this? > >>> > > > > >>> > > > Guozhang > >>> > > > > >>> > > > > >>> > > > On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg < > j...@squareup.com> > >>> > > wrote: > >>> > > > > >>> > > > > In our broker logs, we occasionally see errors like this: > >>> > > > > > >>> > > > > 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2] > >>> > > server.KafkaApis > >>> > > > - > >>> > > > > [KafkaApi-45] Error when processing fetch request for partition > >>> > > > [mytopic,0] > >>> > > > > offset 204243601 from consumer with correlation id 130341 > >>> > > > > kafka.common.OffsetOutOfRangeException: Request for offset > >>> 204243601 > >>> > > but > >>> > > > we > >>> > > > > only have log segments in the range 204343397 to 207423640. > >>> > > > > > >>> > > > > I assume this means there's a consumer that has fallen behind > >>> > consuming > >>> > > > > messages, and the log retention policy has removed messages > before > >>> > they > >>> > > > > could be consumed by the consumer. > >>> > > > > > >>> > > > > However, I'm not 100% which consumer it is, and it looks like > the > >>> > only > >>> > > > info > >>> > > > > we have is the correlation id of the consumer, e.g.: > >>> > > > > > >>> > > > > "from consumer with correlation id 130341" > >>> > > > > > >>> > > > > Is there a way to know which consumer this refers to? It seems > >>> there > >>> > > are > >>> > > > > far more correlation id's than there are consumers. Would it > be > >>> > > possible > >>> > > > > to provide a bit more descriptive error message here, so we can > >>> > > > immediately > >>> > > > > know which consumer is falling behind? > >>> > > > > > >>> > > > > We do see a corresponding entry in the consumer logs too: > >>> > > > > > >>> > > > > 2013-12-23 05:02:08,797 WARN > >>> > > > > > [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45] > >>> > > > > consumer.ConsumerFetcherThread - > >>> > > > > > >>> [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45], > >>> > > > > Current offset 204243601 for partition [mytopic,0] out of > range; > >>> > reset > >>> > > > > offset to 204343397 > >>> > > > > > >>> > > > > But it would be nice to be able to also use the broker log to > >>> quickly > >>> > > > find > >>> > > > > consumers with issues. > >>> > > > > > >>> > > > > Also, I'm not sure, why is logging the event as an ERROR in the > >>> > broker, > >>> > > > but > >>> > > > > a WARN in the consumer? > >>> > > > > > >>> > > > > Jason > >>> > > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > -- > >>> > > > -- Guozhang > >>> > > > > >>> > > > >>> > > >>> > >> > >> > >> > >> -- > >> -- Guozhang >