Hi Catlyn,

This option has never worked outside the Java SDK where it originates from. For the upcoming Beam 2.16.0 release, we have replaced this option with a factory class: https://github.com/apache/beam/blob/da6c1a8f435f5583811785050808a2311db94047/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java#L152

We could bundle a default factory class for the RocksDB state backend and make this easier to configure from Python. Do you mind opening a JIRA issue for this? I can give you permissions for https://issues.apache.org/jira/projects/BEAM/issues if you create an account.

In any case, you could also configure the Flink cluster to use RocksDB. Beam will use whatever state backend is configured.

Thanks,
Max

On 23.08.19 02:57, Catlyn Kong wrote:
Hi all,

I'm experimenting with checkpoints/savepoints in Beam (version 2.14)
when using a Flink (version 1.6.4) runner. Flink was able to take
periodic checkpoints when I setup the flink-conf.yaml correctly. But I
was thinking if it's possible to set the StateBackend on a per job level
by flagging the --state_backend=RocksDBStateBackend option since it's
said to be supported here
<https://beam.apache.org/documentation/runners/flink/>.

But instead I got the following error:
RuntimeError: Pipeline failed in state FAILED:
com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot
construct instance of `org.apache.flink.runtime.state.StateBackend` (no
Creators, like default construct, exist): abstract types either need to
be mapped to concrete types, have custom deserializer, or contain
additional type information
 at [Source: (String)""RocksDBStateBackend""; line: 1, column: 1]

I then saw that there's:
@JasonIgnore
StateBackend getStateBackend();

I'm wondering if this is not supported in python yet? If yes then do we
have plans to support this in the near future?

Best,
Catlyn

Reply via email to