+1 for prioritizing this. On Thu, Jul 9, 2020 at 8:08 AM Piotr Filipiuk <[email protected]> wrote:
> Thank you for looking into this. > > I upvoted the BEAM-6868 <https://issues.apache.org/jira/browse/BEAM-6868>. > Is there anything else I can do to have that feature prioritized, other > than trying to contribute myself? > > Regarding DirectRunner, as I mentioned above I can see in the > worker_handlers.py logs that some data is being consumed from Kafka, since > the offsets logged are consistent with what is being published. Is using > DirectRunner known to be working for this use case? I do not see any errors > (I attached all the logs). The apache/beam_java_sdk:2.22.0 is running (see > logs attached). > > On Thu, Jul 9, 2020 at 7:45 AM Maximilian Michels <[email protected]> wrote: > >> This used to be working but it appears @FinalizeBundle (which KafkaIO >> requires) was simply ignored for portable (Python) pipelines. It looks >> relatively easy to fix. >> >> -Max >> >> On 07.07.20 03:37, Luke Cwik wrote: >> > The KafkaIO implementation relies on checkpointing to be able to update >> > the last committed offset. This is currently unsupported in the >> portable >> > Flink runner. BEAM-6868[1] is the associated JIRA. Please vote on it >> > and/or offer to provide an implementation. >> > >> > 1: https://issues.apache.org/jira/browse/BEAM-6868 >> > >> > On Mon, Jul 6, 2020 at 1:42 PM Piotr Filipiuk <[email protected] >> > <mailto:[email protected]>> wrote: >> > >> > Hi, >> > >> > I am trying to run a simple example that uses Python API to >> > ReadFromKafka, however I am getting the following error when using >> > Flink Runner: >> > >> > java.lang.UnsupportedOperationException: The ActiveBundle does not >> > have a registered bundle checkpoint handler. >> > >> > See full log in read_from_kafka_flink.log >> > >> > I am using: >> > Kafka 2.5.0 >> > Beam 2.22.0 >> > Flink 1.10 >> > >> > When using Direct runner, the pipeline does not fail but does not >> > seem to be consuming any data (see read_from_kafka.log) even though >> > the updated offsets are being logged: >> > >> > [2020-07-06 13:36:01,342] {worker_handlers.py:398} INFO - severity: >> INFO >> > timestamp { >> > seconds: 1594067761 >> > nanos: 340000000 >> > } >> > message: "Reader-0: reading from test-topic-0 starting at offset >> 165" >> > log_location: "org.apache.beam.sdk.io.kafka.KafkaUnboundedSource" >> > thread: "23" >> > >> > I am running both Kafka and Flink locally. I would appreciate your >> > help understanding and fixing the issue. >> > >> > -- >> > Best regards, >> > Piotr >> > >> > > > -- > Best regards, > Piotr >
