Re: Replication does not start because of OuOfOrderSequenceException

2017-10-02 Thread Stas Chizhov
Hi, I've created a ticket for the situation we have now:
https://issues.apache.org/jira/browse/KAFKA-6003. I will file a ticket for
the original Exception that took down replication fetcher thread after some
initial investigation - it might be same issue after all.

Still would appreciate any hints on how to get those topics into fully
replicated state without loosing all data.
Will turning of idempotence on producers and waiting until all old data is
cleaned up help?

Best regards,
Stas.

2017-10-02 20:08 GMT+02:00 Apurva Mehta :

> Hi Stas,
>
> Thanks for reporting this. It would be helpful to have JIRA with more of
> the server logs on the leaders and followers in the time leading up to this
> OutOfOrderSequenceException.
>
> The answers to the following questions would help, when you file the JIRA:
>
> What are the retention settings for this topic? Is it configured for
> compaction? Compaction and deletion? What is the retention.time.ms
> setting?
> What is the retention.bytes setting? What messages are being written to the
> topic? Particularly, do they have a create time explicitly set by the
> application?
>
> Thanks,
> Apurva
>
> On Mon, Oct 2, 2017 at 4:40 AM, Ismael Juma  wrote:
>
> > Hi Stas,
> >
> > Thank you for reporting this. Can you please file an issue? Even if
> > KAFKA-5793 has fixed it for 1.0.0 (which needs to be verified), we should
> > consider whether a fix is needed for the 0.11.0 branch as well.
> >
> > Ismael
> >
> > On Mon, Oct 2, 2017 at 11:28 AM, Stas Chizhov 
> wrote:
> >
> > > Hi,
> > >
> > > We run 0.11.01 and there was a problem with 1 ReplicationFetcher on one
> > of
> > > the brokers - it experience out of order sequence problem for one
> > > topic/partition and was stopped. It stayed stopped over the weekend.
> > During
> > > this time log cleanup was working and by now it has cleaned up all the
> > data
> > > in the partitions that this fetcher was responsible for - including
> other
> > > partitions that didnt have out of order sequence problem at first
> place.
> > It
> > > is not completely clear to me why this initial problem occurred, but at
> > > this moment there is a borker with no data for few partitions and
> > > replication fetcher fails upon restart with
> > > "org.apache.kafka.common.errors.OutOfOrderSequenceException: Invalid
> > > sequence number for new epoch: 0 (request epoch), 154277489 (seq.
> > > number)".  I believe this is
> > > https://issues.apache.org/jira/browse/KAFKA-5793.
> > > However I wonder what is the easiest way of bringing this replicas back
> > > online?
> > >
> > > Best regards,
> > > Stanislav.
> > >
> >
>


Re: Replication does not start because of OuOfOrderSequenceException

2017-10-02 Thread Apurva Mehta
Hi Stas,

Thanks for reporting this. It would be helpful to have JIRA with more of
the server logs on the leaders and followers in the time leading up to this
OutOfOrderSequenceException.

The answers to the following questions would help, when you file the JIRA:

What are the retention settings for this topic? Is it configured for
compaction? Compaction and deletion? What is the retention.time.ms setting?
What is the retention.bytes setting? What messages are being written to the
topic? Particularly, do they have a create time explicitly set by the
application?

Thanks,
Apurva

On Mon, Oct 2, 2017 at 4:40 AM, Ismael Juma  wrote:

> Hi Stas,
>
> Thank you for reporting this. Can you please file an issue? Even if
> KAFKA-5793 has fixed it for 1.0.0 (which needs to be verified), we should
> consider whether a fix is needed for the 0.11.0 branch as well.
>
> Ismael
>
> On Mon, Oct 2, 2017 at 11:28 AM, Stas Chizhov  wrote:
>
> > Hi,
> >
> > We run 0.11.01 and there was a problem with 1 ReplicationFetcher on one
> of
> > the brokers - it experience out of order sequence problem for one
> > topic/partition and was stopped. It stayed stopped over the weekend.
> During
> > this time log cleanup was working and by now it has cleaned up all the
> data
> > in the partitions that this fetcher was responsible for - including other
> > partitions that didnt have out of order sequence problem at first place.
> It
> > is not completely clear to me why this initial problem occurred, but at
> > this moment there is a borker with no data for few partitions and
> > replication fetcher fails upon restart with
> > "org.apache.kafka.common.errors.OutOfOrderSequenceException: Invalid
> > sequence number for new epoch: 0 (request epoch), 154277489 (seq.
> > number)".  I believe this is
> > https://issues.apache.org/jira/browse/KAFKA-5793.
> > However I wonder what is the easiest way of bringing this replicas back
> > online?
> >
> > Best regards,
> > Stanislav.
> >
>


Re: Replication does not start because of OuOfOrderSequenceException

2017-10-02 Thread Ismael Juma
Hi Stas,

Thank you for reporting this. Can you please file an issue? Even if
KAFKA-5793 has fixed it for 1.0.0 (which needs to be verified), we should
consider whether a fix is needed for the 0.11.0 branch as well.

Ismael

On Mon, Oct 2, 2017 at 11:28 AM, Stas Chizhov  wrote:

> Hi,
>
> We run 0.11.01 and there was a problem with 1 ReplicationFetcher on one of
> the brokers - it experience out of order sequence problem for one
> topic/partition and was stopped. It stayed stopped over the weekend. During
> this time log cleanup was working and by now it has cleaned up all the data
> in the partitions that this fetcher was responsible for - including other
> partitions that didnt have out of order sequence problem at first place. It
> is not completely clear to me why this initial problem occurred, but at
> this moment there is a borker with no data for few partitions and
> replication fetcher fails upon restart with
> "org.apache.kafka.common.errors.OutOfOrderSequenceException: Invalid
> sequence number for new epoch: 0 (request epoch), 154277489 (seq.
> number)".  I believe this is
> https://issues.apache.org/jira/browse/KAFKA-5793.
> However I wonder what is the easiest way of bringing this replicas back
> online?
>
> Best regards,
> Stanislav.
>


Replication does not start because of OuOfOrderSequenceException

2017-10-02 Thread Stas Chizhov
Hi,

We run 0.11.01 and there was a problem with 1 ReplicationFetcher on one of
the brokers - it experience out of order sequence problem for one
topic/partition and was stopped. It stayed stopped over the weekend. During
this time log cleanup was working and by now it has cleaned up all the data
in the partitions that this fetcher was responsible for - including other
partitions that didnt have out of order sequence problem at first place. It
is not completely clear to me why this initial problem occurred, but at
this moment there is a borker with no data for few partitions and
replication fetcher fails upon restart with
"org.apache.kafka.common.errors.OutOfOrderSequenceException: Invalid
sequence number for new epoch: 0 (request epoch), 154277489 (seq.
number)".  I believe this is
https://issues.apache.org/jira/browse/KAFKA-5793.
However I wonder what is the easiest way of bringing this replicas back
online?

Best regards,
Stanislav.