Thanks for testing this!
It requires some additional investigations, so I created an issue for that:
https://github.com/apache/beam/issues/26262
Feel free to add more details if you have there.
—
Alexey
> On 13 Apr 2023, at 12:45, Sigalit Eliazov wrote:
>
> I have made the suggested change
I have made the suggested change and used
ConfluentSchemaRegistryDeserializerProvider
the results are slightly better.. average of 8000 msg/sec
Thank you both for your response and i'll appreciate if you can keep me in
the loop in the planned work with kafka schema or let me know if i can
assist
Mine was the similar but
"org.apache.beam.sdk.io.kafka,ConfluentSchemaRegistryDeserializerProvider" is
leveraging
“io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient” that I
guessed should reduce this potential impact.
—
Alexey
> On 12 Apr 2023, at 17:36, John Casey via user
My initial guess is that there are queries being made in order to
retrieve the schemas, which would impact performance, especially if those
queries aren't cached with Beam splitting in mind.
I'm looking to improve our interaction with Kafka schemas in the next
couple of quarters, so I'll keep
I don’t have an exact answer why it’s so much slower for now (only some guesses
but it requires some profiling), though could you try to test the same Kafka
read but with “ConfluentSchemaRegistryDeserializerProvider” instead of
KafkaAvroDeserializer and AvroCoder?
More details and an example
hi,
KafkaIO.read()
.withBootstrapServers(bootstrapServers)
.withTopic(topic)
.withConsumerConfigUpdates(Map.ofEntries(
Map.entry("schema.registry.url", registryURL),
Map.entry(ConsumerConfig.GROUP_ID_CONFIG,
consumerGroup+
How are you using the schema registry? Do you have a code sample?
On Sun, Apr 9, 2023 at 3:06 AM Sigalit Eliazov wrote:
> Hello,
>
> I am trying to understand the effect of schema registry on our pipeline's
> performance. In order to do sowe created a very simple pipeline that reads
> from
Hello,
I am trying to understand the effect of schema registry on our pipeline's
performance. In order to do sowe created a very simple pipeline that reads
from kafka, runs a simple transformation of adding new field and writes of
kafka. the messages are in avro format
I ran this pipeline with