Apologies if this is slightly off-Beam-topic.
I'm setting up a small demo for some colleagues to parse a 21-million
line CSV file using a few different runners (Direct, Flink, and Google
Dataflow). The TextIO using the direct runner seems a bit slow, so I was
wondering if it's perhaps related to the IO. So I pushed the CSV to a
Kafka topic (one line per entry). Now to test the topic I'd like to read
from the start every time. I'm quite new to Kafka, so I'm not sure how
to "reset" it. A bit of searching indicates that something like this for
the ConsumerProperties should be correct:
p.apply(KafkaIO.read()
.withBootstrapServers(options.getBrokerUrl())
.withTopics(ImmutableList.of(options.getTopic()))
.withValueCoder(StringUtf8Coder.of())
.updateConsumerProperties(
* ImmutableMap.of(**
** ConsumerConfig.GROUP_ID_CONFIG,
UUID.randomUUID().toString(),**
** ConsumerConfig.CLIENT_ID_CONFIG, "your_client_id",**
**ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"**
** )*
))
But it didn't seem to work. Is there some other setting that I'm missing?
Kind regards,
Gareth