Apologies if this is slightly off-Beam-topic.

I'm setting up a small demo for some colleagues to parse a 21-million line CSV file using a few different runners (Direct, Flink, and Google Dataflow). The TextIO using the direct runner seems a bit slow, so I was wondering if it's perhaps related to the IO. So I pushed the CSV to a Kafka topic (one line per entry). Now to test the topic I'd like to read from the start every time. I'm quite new to Kafka, so I'm not sure how to "reset" it. A bit of searching indicates that something like this for the ConsumerProperties should be correct:

        p.apply(KafkaIO.read()
            .withBootstrapServers(options.getBrokerUrl())
            .withTopics(ImmutableList.of(options.getTopic()))
            .withValueCoder(StringUtf8Coder.of())
            .updateConsumerProperties(
*                ImmutableMap.of(**
** ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString(),**
**                    ConsumerConfig.CLIENT_ID_CONFIG, "your_client_id",**
**ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"**
**                )*
            ))

But it didn't seem to work. Is there some other setting that I'm missing?

Kind regards,

Gareth

Reply via email to