[ https://issues.apache.org/jira/browse/KAFKA-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bruno Cadonna resolved KAFKA-13887. ----------------------------------- Resolution: Not A Problem > Running multiple instance of same stateful KafkaStreams application on single > host raise Exception > -------------------------------------------------------------------------------------------------- > > Key: KAFKA-13887 > URL: https://issues.apache.org/jira/browse/KAFKA-13887 > Project: Kafka > Issue Type: Improvement > Components: streams > Affects Versions: 2.6.0 > Reporter: Sina Askarnejad > Priority: Minor > > KAFKA-10716 locks the state store directory on the running host, as it stores > the processId in a *kafka-streams-process-metadata* file in this path. As a > result to run multiple instances of the same application on a single host > each instance must run with different *state.dir* config, otherwise the > following exception will be raised for the second instance: > > Exception in thread "main" org.apache.kafka.streams.errors.StreamsException: > Unable to initialize state, this can happen if multiple instances of Kafka > Streams are running in the same state directory > at > org.apache.kafka.streams.processor.internals.StateDirectory.initializeProcessId(StateDirectory.java:191) > at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:868) > at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:851) > at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:821) > at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:733) > > The easiest solution multi-threading. Running single instance with multiple > threads, but the multi-threading programming is not suitable for all > scenarios. e.g., when the tasks are CPU intensive, or in large scale > scenarios, or fully utilizing multi core CPUS. > > The second solution is multi-processing. This solution on a single host needs > extra work and advisor, as each instance needs to be run with different > {*}state.dir{*}. It is a good enhancement if kafkaStreams could handle this > config for multi instance. > > The proposed solution is that the KafkaStreams use the > */\{state.dir}/\{application.id}/\{ordinal.number}* path instead of > */\{state.dir}/\{application.id}* to store the meta file and states. The > *ordinal.number* starts with 0 and is incremental. > When an instance starts it checks the ordinal.number directories start by 0 > and finds the first subdirectory that is not locked and use that for its > state directory, this way all the tasks assigns correctly on rebalance and > multiple instance can be run on single host. -- This message was sent by Atlassian Jira (v8.20.10#820010)