Re: AvroIO.to(DynamicAvroDestinations) deprecated?

2022-09-13 Thread John Casey via user
That would be great, thanks!

On Tue, Sep 13, 2022 at 3:00 PM Steve Niemitz  wrote:

> Ah this is super useful context, thank you!  I can submit a couple PRs to
> get AvroIO.sink up to parity if that's the way forward.
>
> On Tue, Sep 13, 2022 at 2:53 PM John Casey via user 
> wrote:
>
>> Hi Steve,
>>
>> I've asked around, and it looks like this confusing state is due to a
>> migration that isn't complete (and likely won't be until Beam 3.0).
>>
>> Here is the doc that explains some of the history:
>> https://docs.google.com/document/d/1zcF4ZGtq8pxzLZxgD_JMWAouSszIf9LnFANWHKBsZlg/edit
>> And a PR that implements some of the changes:
>> https://github.com/apache/beam/pull/3817
>>
>> Based on this, AvroIO.sink is what we recommend. Please feel free to
>> raise issues on Github to account for features you're missing. In addition,
>> if you think they are straightforward changes, I'd be happy to discuss
>> designs, or look at proposed changes to make these features available.
>>
>> I hope this helps,
>> John
>>
>> On Mon, Sep 12, 2022 at 3:38 PM Steve Niemitz 
>> wrote:
>>
>>> We're trying to do some semi-advanced custom logic (custom writers and
>>> schemas per destination) with AvroIO, and want to use
>>> DynamicAvroDestinations to accomplish this.
>>>
>>> However, AvroIO.to(DynamicAvroDestinations) is deprecated, but there
>>> doesn't seem to be any other way to accomplish what we want here.
>>> AvroIO.sink is much less sophisticated than the non-sink options, missing
>>> much of the configurability that the non-sink version has.  For example,
>>> there's no way to project from the UserT -> OutputT with the sink version,
>>> only from UserT -> GenericRecord, which isn't what we want.
>>>
>>> It seems like most things would be trivial to fix or add on the
>>> AvroIO.sink implementation, is that the intended way that people would be
>>> consuming AvroIO?  I'm a little confused with FileIO.write/writeDynamic vs
>>> WriteFiles vs AvroIO.write, some seem deprecated, and some seem
>>> not-deprecated-but-not-recommended.  To add to the confusion AvroIO.write
>>> uses WriteFiles, but the documentation for the deprecated
>>> AvroIO.to(DynamicAvroDestinations) points to FileIO.write.  Which is the
>>> "right" one to use?
>>>
>>


Re: AvroIO.to(DynamicAvroDestinations) deprecated?

2022-09-13 Thread Steve Niemitz
Ah this is super useful context, thank you!  I can submit a couple PRs to
get AvroIO.sink up to parity if that's the way forward.

On Tue, Sep 13, 2022 at 2:53 PM John Casey via user 
wrote:

> Hi Steve,
>
> I've asked around, and it looks like this confusing state is due to a
> migration that isn't complete (and likely won't be until Beam 3.0).
>
> Here is the doc that explains some of the history:
> https://docs.google.com/document/d/1zcF4ZGtq8pxzLZxgD_JMWAouSszIf9LnFANWHKBsZlg/edit
> And a PR that implements some of the changes:
> https://github.com/apache/beam/pull/3817
>
> Based on this, AvroIO.sink is what we recommend. Please feel free to raise
> issues on Github to account for features you're missing. In addition, if
> you think they are straightforward changes, I'd be happy to discuss
> designs, or look at proposed changes to make these features available.
>
> I hope this helps,
> John
>
> On Mon, Sep 12, 2022 at 3:38 PM Steve Niemitz  wrote:
>
>> We're trying to do some semi-advanced custom logic (custom writers and
>> schemas per destination) with AvroIO, and want to use
>> DynamicAvroDestinations to accomplish this.
>>
>> However, AvroIO.to(DynamicAvroDestinations) is deprecated, but there
>> doesn't seem to be any other way to accomplish what we want here.
>> AvroIO.sink is much less sophisticated than the non-sink options, missing
>> much of the configurability that the non-sink version has.  For example,
>> there's no way to project from the UserT -> OutputT with the sink version,
>> only from UserT -> GenericRecord, which isn't what we want.
>>
>> It seems like most things would be trivial to fix or add on the
>> AvroIO.sink implementation, is that the intended way that people would be
>> consuming AvroIO?  I'm a little confused with FileIO.write/writeDynamic vs
>> WriteFiles vs AvroIO.write, some seem deprecated, and some seem
>> not-deprecated-but-not-recommended.  To add to the confusion AvroIO.write
>> uses WriteFiles, but the documentation for the deprecated
>> AvroIO.to(DynamicAvroDestinations) points to FileIO.write.  Which is the
>> "right" one to use?
>>
>


Re: AvroIO.to(DynamicAvroDestinations) deprecated?

2022-09-13 Thread John Casey via user
Hi Steve,

I've asked around, and it looks like this confusing state is due to a
migration that isn't complete (and likely won't be until Beam 3.0).

Here is the doc that explains some of the history:
https://docs.google.com/document/d/1zcF4ZGtq8pxzLZxgD_JMWAouSszIf9LnFANWHKBsZlg/edit
And a PR that implements some of the changes:
https://github.com/apache/beam/pull/3817

Based on this, AvroIO.sink is what we recommend. Please feel free to raise
issues on Github to account for features you're missing. In addition, if
you think they are straightforward changes, I'd be happy to discuss
designs, or look at proposed changes to make these features available.

I hope this helps,
John

On Mon, Sep 12, 2022 at 3:38 PM Steve Niemitz  wrote:

> We're trying to do some semi-advanced custom logic (custom writers and
> schemas per destination) with AvroIO, and want to use
> DynamicAvroDestinations to accomplish this.
>
> However, AvroIO.to(DynamicAvroDestinations) is deprecated, but there
> doesn't seem to be any other way to accomplish what we want here.
> AvroIO.sink is much less sophisticated than the non-sink options, missing
> much of the configurability that the non-sink version has.  For example,
> there's no way to project from the UserT -> OutputT with the sink version,
> only from UserT -> GenericRecord, which isn't what we want.
>
> It seems like most things would be trivial to fix or add on the
> AvroIO.sink implementation, is that the intended way that people would be
> consuming AvroIO?  I'm a little confused with FileIO.write/writeDynamic vs
> WriteFiles vs AvroIO.write, some seem deprecated, and some seem
> not-deprecated-but-not-recommended.  To add to the confusion AvroIO.write
> uses WriteFiles, but the documentation for the deprecated
> AvroIO.to(DynamicAvroDestinations) points to FileIO.write.  Which is the
> "right" one to use?
>


Re: [troubleshooting] KafkaIO#write gets stuck "since the associated topicId changed from null to "

2022-09-13 Thread Evan Galpin
There's a few related log lines, but there isn't a full stacktrace as the
info originates from a logger statement[1] as opposed to thrown exception.
The related log lines are like so:

org.apache.kafka.clients.NetworkClient [Producer clientId=producer-109]
Disconnecting from node 10 due to socket connection setup timeout. The
timeout value is 11436 ms.[2]

and

org.apache.kafka.clients.NetworkClient [Producer clientId=producer-109]
Node 10 disconnected.[3]

[1]
https://github.com/apache/kafka/blob/f653cb7b5889fd619ab0e6a25216bd981a9d82bf/clients/src/main/java/org/apache/kafka/clients/Metadata.java#L402
[2]
https://github.com/apache/kafka/blob/1135f22eaf404fdf76489302648199578876c4ac/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L820
[3]
https://github.com/apache/kafka/blob/1135f22eaf404fdf76489302648199578876c4ac/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L937

On Tue, Sep 13, 2022 at 12:17 PM Alexey Romanenko 
wrote:

> Do you have by any chance the full stacktrace of this error?
>
> —
> Alexey
>
> On 13 Sep 2022, at 18:05, Evan Galpin  wrote:
>
> Ya likewise, I'd expect this to be handled in the Kafka code without the
> need for special handling by Beam.  I'll reach out to Kafka mailing list as
> well and try to get a better understanding of the root issue.  Thanks for
> your time so far John! I'll ping this thread with any interesting findings
> or insights.
>
> Thanks,
> Evan
>
> On Tue, Sep 13, 2022 at 11:38 AM John Casey via user 
> wrote:
>
>> In principle yes, but I don't see any Beam level code to handle that. I'm
>> a bit surprised it isn't handled in the Kafka producer layer itself.
>>
>> On Tue, Sep 13, 2022 at 11:15 AM Evan Galpin  wrote:
>>
>>> I'm not certain based on the logs where the disconnect is starting.  I
>>> have seen TimeoutExceptions like that mentioned in the SO issue you linked,
>>> so if we assume it's starting from the kafka cluster side, my concern is
>>> that the producers don't seem to be able to gracefully recover.  Given that
>>> restarting the pipeline (in this case, in Dataflow) makes the issue go
>>> away, I'm under the impression that producer clients in KafkaIO#write can
>>> get into a state that they're not able to recover from after experiencing a
>>> disconnect.  Is graceful recovery after cluster unavailability something
>>> that would be expected to be supported by KafkaIO today?
>>>
>>> Thanks,
>>> Evan
>>>
>>> On Tue, Sep 13, 2022 at 11:07 AM John Casey via user <
>>> user@beam.apache.org> wrote:
>>>
 Googling that error message returned
 https://stackoverflow.com/questions/71077394/kafka-producer-resetting-the-last-seen-epoch-of-partition-resulting-in-timeout
 and
 https://github.com/a0x8o/kafka/blob/master/clients/src/main/java/org/apache/kafka/clients/Metadata.java#L402

 Which suggests that there is some sort of disconnect happening between
 your pipeline and your kafka instance.

 Do you see any logs when this disconnect starts, on the Beam or Kafka
 side of things?

 On Tue, Sep 13, 2022 at 10:38 AM Evan Galpin 
 wrote:

> Thanks for the quick reply John!  I should also add that the root
> issue is not so much the logging, rather that these log messages seem to 
> be
> correlated with periods where producers are not able to publish data to
> kafka.  The issue of not being able to publish data does not seem to
> resolve until restarting or updating the pipeline.
>
> Here's my publisher config map:
>
> .withProducerConfigUpdates(
> Map.ofEntries(
> Map.entry(
> ProducerConfig.PARTITIONER_CLASS_CONFIG,
> DefaultPartitioner.class),
> Map.entry(
> ProducerConfig.COMPRESSION_TYPE_CONFIG,
> CompressionType.GZIP.name ),
> Map.entry(
>
> CommonClientConfigs.SECURITY_PROTOCOL_CONFIG,
> SecurityProtocol.SASL_SSL.name
> ),
> Map.entry(
> SaslConfigs.SASL_MECHANISM,
> PlainSaslServer.PLAIN_MECHANISM),
> Map.entry(
> SaslConfigs.SASL_JAAS_CONFIG,
> "org.apache.kafka.common.security.plain.PlainLoginModule required
> username=\"\" password=\"\";")))
>
> Thanks,
> Evan
>
> On Tue, Sep 13, 2022 at 10:30 AM John Casey 
> wrote:
>
>> Hi Evan,
>>
>> I haven't seen this before. Can you share your Kafka write
>> configuration, and any other stack traces that could be relevant?
>>
>> John
>>
>> On Tue, Sep 13, 2022 at 10:23 AM Evan Galpin 
>> wrote:
>>
>>> Hi all,
>>>
>>> I've recently started using the KafkaIO connector as a sink, and am

Re: [troubleshooting] KafkaIO#write gets stuck "since the associated topicId changed from null to "

2022-09-13 Thread Alexey Romanenko
Do you have by any chance the full stacktrace of this error?

—
Alexey

> On 13 Sep 2022, at 18:05, Evan Galpin  wrote:
> 
> Ya likewise, I'd expect this to be handled in the Kafka code without the need 
> for special handling by Beam.  I'll reach out to Kafka mailing list as well 
> and try to get a better understanding of the root issue.  Thanks for your 
> time so far John! I'll ping this thread with any interesting findings or 
> insights.
> 
> Thanks,
> Evan
> 
> On Tue, Sep 13, 2022 at 11:38 AM John Casey via user  > wrote:
> In principle yes, but I don't see any Beam level code to handle that. I'm a 
> bit surprised it isn't handled in the Kafka producer layer itself.
> 
> On Tue, Sep 13, 2022 at 11:15 AM Evan Galpin  > wrote:
> I'm not certain based on the logs where the disconnect is starting.  I have 
> seen TimeoutExceptions like that mentioned in the SO issue you linked, so if 
> we assume it's starting from the kafka cluster side, my concern is that the 
> producers don't seem to be able to gracefully recover.  Given that restarting 
> the pipeline (in this case, in Dataflow) makes the issue go away, I'm under 
> the impression that producer clients in KafkaIO#write can get into a state 
> that they're not able to recover from after experiencing a disconnect.  Is 
> graceful recovery after cluster unavailability something that would be 
> expected to be supported by KafkaIO today?
> 
> Thanks,
> Evan
> 
> On Tue, Sep 13, 2022 at 11:07 AM John Casey via user  > wrote:
> Googling that error message returned 
> https://stackoverflow.com/questions/71077394/kafka-producer-resetting-the-last-seen-epoch-of-partition-resulting-in-timeout
>  
> 
> and 
> https://github.com/a0x8o/kafka/blob/master/clients/src/main/java/org/apache/kafka/clients/Metadata.java#L402
>  
> 
> 
> Which suggests that there is some sort of disconnect happening between your 
> pipeline and your kafka instance.
> 
> Do you see any logs when this disconnect starts, on the Beam or Kafka side of 
> things?
> 
> On Tue, Sep 13, 2022 at 10:38 AM Evan Galpin  > wrote:
> Thanks for the quick reply John!  I should also add that the root issue is 
> not so much the logging, rather that these log messages seem to be correlated 
> with periods where producers are not able to publish data to kafka.  The 
> issue of not being able to publish data does not seem to resolve until 
> restarting or updating the pipeline.
> 
> Here's my publisher config map:
> 
> .withProducerConfigUpdates(
> Map.ofEntries(
> Map.entry(
> ProducerConfig.PARTITIONER_CLASS_CONFIG, 
> DefaultPartitioner.class),
> Map.entry(
> ProducerConfig.COMPRESSION_TYPE_CONFIG, 
> CompressionType.GZIP.name ),
> Map.entry(
> CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, 
> SecurityProtocol.SASL_SSL.name ),
> Map.entry(
> SaslConfigs.SASL_MECHANISM, 
> PlainSaslServer.PLAIN_MECHANISM),
> Map.entry(
> SaslConfigs.SASL_JAAS_CONFIG, 
> "org.apache.kafka.common.security.plain.PlainLoginModule required 
> username=\"\" password=\"\";")))
> 
> Thanks,
> Evan
> 
> On Tue, Sep 13, 2022 at 10:30 AM John Casey  > wrote:
> Hi Evan,
> 
> I haven't seen this before. Can you share your Kafka write configuration, and 
> any other stack traces that could be relevant? 
> 
> John
> 
> On Tue, Sep 13, 2022 at 10:23 AM Evan Galpin  > wrote:
> Hi all,
> 
> I've recently started using the KafkaIO connector as a sink, and am new to 
> Kafka in general.  My kafka clusters are hosted by Confluent Cloud.  I'm 
> using Beam SDK 2.41.0.  At least daily, the producers in my Beam pipeline are 
> getting stuck in a loop frantically logging this message:
> 
> Node n disconnected.
> 
> Resetting the last seen epoch of partition  to x since the 
> associated topicId changed from null to 
> 
> Updating the running pipeline "resolves" the issue I believe as a result of 
> recreating the Kafka producer clients, but it seems that as-is the KafkaIO 
> producer clients are not resilient to node disconnects.  Might I be missing a 
> configuration option, or are there any known issues like this?
> 
> Thanks,
> Evan



Re: [troubleshooting] KafkaIO#write gets stuck "since the associated topicId changed from null to "

2022-09-13 Thread Evan Galpin
Ya likewise, I'd expect this to be handled in the Kafka code without the
need for special handling by Beam.  I'll reach out to Kafka mailing list as
well and try to get a better understanding of the root issue.  Thanks for
your time so far John! I'll ping this thread with any interesting findings
or insights.

Thanks,
Evan

On Tue, Sep 13, 2022 at 11:38 AM John Casey via user 
wrote:

> In principle yes, but I don't see any Beam level code to handle that. I'm
> a bit surprised it isn't handled in the Kafka producer layer itself.
>
> On Tue, Sep 13, 2022 at 11:15 AM Evan Galpin  wrote:
>
>> I'm not certain based on the logs where the disconnect is starting.  I
>> have seen TimeoutExceptions like that mentioned in the SO issue you linked,
>> so if we assume it's starting from the kafka cluster side, my concern is
>> that the producers don't seem to be able to gracefully recover.  Given that
>> restarting the pipeline (in this case, in Dataflow) makes the issue go
>> away, I'm under the impression that producer clients in KafkaIO#write can
>> get into a state that they're not able to recover from after experiencing a
>> disconnect.  Is graceful recovery after cluster unavailability something
>> that would be expected to be supported by KafkaIO today?
>>
>> Thanks,
>> Evan
>>
>> On Tue, Sep 13, 2022 at 11:07 AM John Casey via user <
>> user@beam.apache.org> wrote:
>>
>>> Googling that error message returned
>>> https://stackoverflow.com/questions/71077394/kafka-producer-resetting-the-last-seen-epoch-of-partition-resulting-in-timeout
>>> and
>>> https://github.com/a0x8o/kafka/blob/master/clients/src/main/java/org/apache/kafka/clients/Metadata.java#L402
>>>
>>> Which suggests that there is some sort of disconnect happening between
>>> your pipeline and your kafka instance.
>>>
>>> Do you see any logs when this disconnect starts, on the Beam or Kafka
>>> side of things?
>>>
>>> On Tue, Sep 13, 2022 at 10:38 AM Evan Galpin  wrote:
>>>
 Thanks for the quick reply John!  I should also add that the root issue
 is not so much the logging, rather that these log messages seem to be
 correlated with periods where producers are not able to publish data to
 kafka.  The issue of not being able to publish data does not seem to
 resolve until restarting or updating the pipeline.

 Here's my publisher config map:

 .withProducerConfigUpdates(
 Map.ofEntries(
 Map.entry(
 ProducerConfig.PARTITIONER_CLASS_CONFIG,
 DefaultPartitioner.class),
 Map.entry(
 ProducerConfig.COMPRESSION_TYPE_CONFIG,
 CompressionType.GZIP.name),
 Map.entry(

 CommonClientConfigs.SECURITY_PROTOCOL_CONFIG,
 SecurityProtocol.SASL_SSL.name),
 Map.entry(
 SaslConfigs.SASL_MECHANISM,
 PlainSaslServer.PLAIN_MECHANISM),
 Map.entry(
 SaslConfigs.SASL_JAAS_CONFIG,
 "org.apache.kafka.common.security.plain.PlainLoginModule required
 username=\"\" password=\"\";")))

 Thanks,
 Evan

 On Tue, Sep 13, 2022 at 10:30 AM John Casey 
 wrote:

> Hi Evan,
>
> I haven't seen this before. Can you share your Kafka write
> configuration, and any other stack traces that could be relevant?
>
> John
>
> On Tue, Sep 13, 2022 at 10:23 AM Evan Galpin 
> wrote:
>
>> Hi all,
>>
>> I've recently started using the KafkaIO connector as a sink, and am
>> new to Kafka in general.  My kafka clusters are hosted by Confluent 
>> Cloud.
>> I'm using Beam SDK 2.41.0.  At least daily, the producers in my Beam
>> pipeline are getting stuck in a loop frantically logging this message:
>>
>> Node n disconnected.
>>
>> Resetting the last seen epoch of partition  to x
>> since the associated topicId changed from null to 
>>
>> Updating the running pipeline "resolves" the issue I believe as a
>> result of recreating the Kafka producer clients, but it seems that as-is
>> the KafkaIO producer clients are not resilient to node disconnects.  
>> Might
>> I be missing a configuration option, or are there any known issues like
>> this?
>>
>> Thanks,
>> Evan
>>
>


Re: [troubleshooting] KafkaIO#write gets stuck "since the associated topicId changed from null to "

2022-09-13 Thread John Casey via user
In principle yes, but I don't see any Beam level code to handle that. I'm a
bit surprised it isn't handled in the Kafka producer layer itself.

On Tue, Sep 13, 2022 at 11:15 AM Evan Galpin  wrote:

> I'm not certain based on the logs where the disconnect is starting.  I
> have seen TimeoutExceptions like that mentioned in the SO issue you linked,
> so if we assume it's starting from the kafka cluster side, my concern is
> that the producers don't seem to be able to gracefully recover.  Given that
> restarting the pipeline (in this case, in Dataflow) makes the issue go
> away, I'm under the impression that producer clients in KafkaIO#write can
> get into a state that they're not able to recover from after experiencing a
> disconnect.  Is graceful recovery after cluster unavailability something
> that would be expected to be supported by KafkaIO today?
>
> Thanks,
> Evan
>
> On Tue, Sep 13, 2022 at 11:07 AM John Casey via user 
> wrote:
>
>> Googling that error message returned
>> https://stackoverflow.com/questions/71077394/kafka-producer-resetting-the-last-seen-epoch-of-partition-resulting-in-timeout
>> and
>> https://github.com/a0x8o/kafka/blob/master/clients/src/main/java/org/apache/kafka/clients/Metadata.java#L402
>>
>> Which suggests that there is some sort of disconnect happening between
>> your pipeline and your kafka instance.
>>
>> Do you see any logs when this disconnect starts, on the Beam or Kafka
>> side of things?
>>
>> On Tue, Sep 13, 2022 at 10:38 AM Evan Galpin  wrote:
>>
>>> Thanks for the quick reply John!  I should also add that the root issue
>>> is not so much the logging, rather that these log messages seem to be
>>> correlated with periods where producers are not able to publish data to
>>> kafka.  The issue of not being able to publish data does not seem to
>>> resolve until restarting or updating the pipeline.
>>>
>>> Here's my publisher config map:
>>>
>>> .withProducerConfigUpdates(
>>> Map.ofEntries(
>>> Map.entry(
>>> ProducerConfig.PARTITIONER_CLASS_CONFIG,
>>> DefaultPartitioner.class),
>>> Map.entry(
>>> ProducerConfig.COMPRESSION_TYPE_CONFIG,
>>> CompressionType.GZIP.name),
>>> Map.entry(
>>>
>>> CommonClientConfigs.SECURITY_PROTOCOL_CONFIG,
>>> SecurityProtocol.SASL_SSL.name),
>>> Map.entry(
>>> SaslConfigs.SASL_MECHANISM,
>>> PlainSaslServer.PLAIN_MECHANISM),
>>> Map.entry(
>>> SaslConfigs.SASL_JAAS_CONFIG,
>>> "org.apache.kafka.common.security.plain.PlainLoginModule required
>>> username=\"\" password=\"\";")))
>>>
>>> Thanks,
>>> Evan
>>>
>>> On Tue, Sep 13, 2022 at 10:30 AM John Casey 
>>> wrote:
>>>
 Hi Evan,

 I haven't seen this before. Can you share your Kafka write
 configuration, and any other stack traces that could be relevant?

 John

 On Tue, Sep 13, 2022 at 10:23 AM Evan Galpin 
 wrote:

> Hi all,
>
> I've recently started using the KafkaIO connector as a sink, and am
> new to Kafka in general.  My kafka clusters are hosted by Confluent Cloud.
> I'm using Beam SDK 2.41.0.  At least daily, the producers in my Beam
> pipeline are getting stuck in a loop frantically logging this message:
>
> Node n disconnected.
>
> Resetting the last seen epoch of partition  to x
> since the associated topicId changed from null to 
>
> Updating the running pipeline "resolves" the issue I believe as a
> result of recreating the Kafka producer clients, but it seems that as-is
> the KafkaIO producer clients are not resilient to node disconnects.  Might
> I be missing a configuration option, or are there any known issues like
> this?
>
> Thanks,
> Evan
>



Re: [troubleshooting] KafkaIO#write gets stuck "since the associated topicId changed from null to "

2022-09-13 Thread Evan Galpin
I'm not certain based on the logs where the disconnect is starting.  I have
seen TimeoutExceptions like that mentioned in the SO issue you linked, so
if we assume it's starting from the kafka cluster side, my concern is that
the producers don't seem to be able to gracefully recover.  Given that
restarting the pipeline (in this case, in Dataflow) makes the issue go
away, I'm under the impression that producer clients in KafkaIO#write can
get into a state that they're not able to recover from after experiencing a
disconnect.  Is graceful recovery after cluster unavailability something
that would be expected to be supported by KafkaIO today?

Thanks,
Evan

On Tue, Sep 13, 2022 at 11:07 AM John Casey via user 
wrote:

> Googling that error message returned
> https://stackoverflow.com/questions/71077394/kafka-producer-resetting-the-last-seen-epoch-of-partition-resulting-in-timeout
> and
> https://github.com/a0x8o/kafka/blob/master/clients/src/main/java/org/apache/kafka/clients/Metadata.java#L402
>
> Which suggests that there is some sort of disconnect happening between
> your pipeline and your kafka instance.
>
> Do you see any logs when this disconnect starts, on the Beam or Kafka side
> of things?
>
> On Tue, Sep 13, 2022 at 10:38 AM Evan Galpin  wrote:
>
>> Thanks for the quick reply John!  I should also add that the root issue
>> is not so much the logging, rather that these log messages seem to be
>> correlated with periods where producers are not able to publish data to
>> kafka.  The issue of not being able to publish data does not seem to
>> resolve until restarting or updating the pipeline.
>>
>> Here's my publisher config map:
>>
>> .withProducerConfigUpdates(
>> Map.ofEntries(
>> Map.entry(
>> ProducerConfig.PARTITIONER_CLASS_CONFIG,
>> DefaultPartitioner.class),
>> Map.entry(
>> ProducerConfig.COMPRESSION_TYPE_CONFIG,
>> CompressionType.GZIP.name),
>> Map.entry(
>> CommonClientConfigs.SECURITY_PROTOCOL_CONFIG,
>> SecurityProtocol.SASL_SSL.name),
>> Map.entry(
>> SaslConfigs.SASL_MECHANISM,
>> PlainSaslServer.PLAIN_MECHANISM),
>> Map.entry(
>> SaslConfigs.SASL_JAAS_CONFIG,
>> "org.apache.kafka.common.security.plain.PlainLoginModule required
>> username=\"\" password=\"\";")))
>>
>> Thanks,
>> Evan
>>
>> On Tue, Sep 13, 2022 at 10:30 AM John Casey 
>> wrote:
>>
>>> Hi Evan,
>>>
>>> I haven't seen this before. Can you share your Kafka write
>>> configuration, and any other stack traces that could be relevant?
>>>
>>> John
>>>
>>> On Tue, Sep 13, 2022 at 10:23 AM Evan Galpin  wrote:
>>>
 Hi all,

 I've recently started using the KafkaIO connector as a sink, and am new
 to Kafka in general.  My kafka clusters are hosted by Confluent Cloud.  I'm
 using Beam SDK 2.41.0.  At least daily, the producers in my Beam pipeline
 are getting stuck in a loop frantically logging this message:

 Node n disconnected.

 Resetting the last seen epoch of partition  to x since
 the associated topicId changed from null to 

 Updating the running pipeline "resolves" the issue I believe as a
 result of recreating the Kafka producer clients, but it seems that as-is
 the KafkaIO producer clients are not resilient to node disconnects.  Might
 I be missing a configuration option, or are there any known issues like
 this?

 Thanks,
 Evan

>>>


Re: [troubleshooting] KafkaIO#write gets stuck "since the associated topicId changed from null to "

2022-09-13 Thread John Casey via user
Googling that error message returned
https://stackoverflow.com/questions/71077394/kafka-producer-resetting-the-last-seen-epoch-of-partition-resulting-in-timeout
and
https://github.com/a0x8o/kafka/blob/master/clients/src/main/java/org/apache/kafka/clients/Metadata.java#L402

Which suggests that there is some sort of disconnect happening between your
pipeline and your kafka instance.

Do you see any logs when this disconnect starts, on the Beam or Kafka side
of things?

On Tue, Sep 13, 2022 at 10:38 AM Evan Galpin  wrote:

> Thanks for the quick reply John!  I should also add that the root issue is
> not so much the logging, rather that these log messages seem to be
> correlated with periods where producers are not able to publish data to
> kafka.  The issue of not being able to publish data does not seem to
> resolve until restarting or updating the pipeline.
>
> Here's my publisher config map:
>
> .withProducerConfigUpdates(
> Map.ofEntries(
> Map.entry(
> ProducerConfig.PARTITIONER_CLASS_CONFIG,
> DefaultPartitioner.class),
> Map.entry(
> ProducerConfig.COMPRESSION_TYPE_CONFIG,
> CompressionType.GZIP.name),
> Map.entry(
> CommonClientConfigs.SECURITY_PROTOCOL_CONFIG,
> SecurityProtocol.SASL_SSL.name),
> Map.entry(
> SaslConfigs.SASL_MECHANISM,
> PlainSaslServer.PLAIN_MECHANISM),
> Map.entry(
> SaslConfigs.SASL_JAAS_CONFIG,
> "org.apache.kafka.common.security.plain.PlainLoginModule required
> username=\"\" password=\"\";")))
>
> Thanks,
> Evan
>
> On Tue, Sep 13, 2022 at 10:30 AM John Casey  wrote:
>
>> Hi Evan,
>>
>> I haven't seen this before. Can you share your Kafka write configuration,
>> and any other stack traces that could be relevant?
>>
>> John
>>
>> On Tue, Sep 13, 2022 at 10:23 AM Evan Galpin  wrote:
>>
>>> Hi all,
>>>
>>> I've recently started using the KafkaIO connector as a sink, and am new
>>> to Kafka in general.  My kafka clusters are hosted by Confluent Cloud.  I'm
>>> using Beam SDK 2.41.0.  At least daily, the producers in my Beam pipeline
>>> are getting stuck in a loop frantically logging this message:
>>>
>>> Node n disconnected.
>>>
>>> Resetting the last seen epoch of partition  to x since
>>> the associated topicId changed from null to 
>>>
>>> Updating the running pipeline "resolves" the issue I believe as a result
>>> of recreating the Kafka producer clients, but it seems that as-is the
>>> KafkaIO producer clients are not resilient to node disconnects.  Might I be
>>> missing a configuration option, or are there any known issues like this?
>>>
>>> Thanks,
>>> Evan
>>>
>>


Re: [troubleshooting] KafkaIO#write gets stuck "since the associated topicId changed from null to "

2022-09-13 Thread Evan Galpin
Thanks for the quick reply John!  I should also add that the root issue is
not so much the logging, rather that these log messages seem to be
correlated with periods where producers are not able to publish data to
kafka.  The issue of not being able to publish data does not seem to
resolve until restarting or updating the pipeline.

Here's my publisher config map:

.withProducerConfigUpdates(
Map.ofEntries(
Map.entry(
ProducerConfig.PARTITIONER_CLASS_CONFIG,
DefaultPartitioner.class),
Map.entry(
ProducerConfig.COMPRESSION_TYPE_CONFIG,
CompressionType.GZIP.name),
Map.entry(
CommonClientConfigs.SECURITY_PROTOCOL_CONFIG,
SecurityProtocol.SASL_SSL.name),
Map.entry(
SaslConfigs.SASL_MECHANISM,
PlainSaslServer.PLAIN_MECHANISM),
Map.entry(
SaslConfigs.SASL_JAAS_CONFIG,
"org.apache.kafka.common.security.plain.PlainLoginModule required
username=\"\" password=\"\";")))

Thanks,
Evan

On Tue, Sep 13, 2022 at 10:30 AM John Casey  wrote:

> Hi Evan,
>
> I haven't seen this before. Can you share your Kafka write configuration,
> and any other stack traces that could be relevant?
>
> John
>
> On Tue, Sep 13, 2022 at 10:23 AM Evan Galpin  wrote:
>
>> Hi all,
>>
>> I've recently started using the KafkaIO connector as a sink, and am new
>> to Kafka in general.  My kafka clusters are hosted by Confluent Cloud.  I'm
>> using Beam SDK 2.41.0.  At least daily, the producers in my Beam pipeline
>> are getting stuck in a loop frantically logging this message:
>>
>> Node n disconnected.
>>
>> Resetting the last seen epoch of partition  to x since
>> the associated topicId changed from null to 
>>
>> Updating the running pipeline "resolves" the issue I believe as a result
>> of recreating the Kafka producer clients, but it seems that as-is the
>> KafkaIO producer clients are not resilient to node disconnects.  Might I be
>> missing a configuration option, or are there any known issues like this?
>>
>> Thanks,
>> Evan
>>
>


Re: [troubleshooting] KafkaIO#write gets stuck "since the associated topicId changed from null to "

2022-09-13 Thread John Casey via user
Hi Evan,

I haven't seen this before. Can you share your Kafka write configuration,
and any other stack traces that could be relevant?

John

On Tue, Sep 13, 2022 at 10:23 AM Evan Galpin  wrote:

> Hi all,
>
> I've recently started using the KafkaIO connector as a sink, and am new to
> Kafka in general.  My kafka clusters are hosted by Confluent Cloud.  I'm
> using Beam SDK 2.41.0.  At least daily, the producers in my Beam pipeline
> are getting stuck in a loop frantically logging this message:
>
> Node n disconnected.
>
> Resetting the last seen epoch of partition  to x since
> the associated topicId changed from null to 
>
> Updating the running pipeline "resolves" the issue I believe as a result
> of recreating the Kafka producer clients, but it seems that as-is the
> KafkaIO producer clients are not resilient to node disconnects.  Might I be
> missing a configuration option, or are there any known issues like this?
>
> Thanks,
> Evan
>


[troubleshooting] KafkaIO#write gets stuck "since the associated topicId changed from null to "

2022-09-13 Thread Evan Galpin
Hi all,

I've recently started using the KafkaIO connector as a sink, and am new to
Kafka in general.  My kafka clusters are hosted by Confluent Cloud.  I'm
using Beam SDK 2.41.0.  At least daily, the producers in my Beam pipeline
are getting stuck in a loop frantically logging this message:

Node n disconnected.

Resetting the last seen epoch of partition  to x since the
associated topicId changed from null to 

Updating the running pipeline "resolves" the issue I believe as a result of
recreating the Kafka producer clients, but it seems that as-is the KafkaIO
producer clients are not resilient to node disconnects.  Might I be missing
a configuration option, or are there any known issues like this?

Thanks,
Evan