Shekharrajak opened a new pull request, #53626:
URL: https://github.com/apache/spark/pull/53626
### What changes were proposed in this pull request?
This PR adds support for Kafka 4.x Share Groups (KIP-932) in Spark
Structured Streaming, enabling queue semantics where multiple consumers can
receive records from the same partition concurrently.
Key components:
- New `kafka-share` data source format
- Non-sequential offset tracking via `ShareStateBatch`
- Acknowledgment-based delivery (ACCEPT/RELEASE/REJECT)
- Three exactly-once strategies: idempotent sink, two-phase commit,
checkpoint-dedup
- Recovery support via acquisition lock expiry and checkpointing
### Why are the changes needed?
Traditional Kafka consumer groups provide pub/sub semantics where each
partition is assigned to exactly one consumer. Share Groups (KIP-932) introduce
queue semantics enabling:
1. Load balancing across consumers without partition limits
2. Automatic redelivery on failure without manual offset management
3. Per-record acknowledgment for fine-grained processing control
This is essential for workloads requiring high fan-out consumption patterns.
### Does this PR introduce _any_ user-facing change?
Yes. Adds new `kafka-share` data source:
spark.readStream
.format("kafka-share")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("kafka.share.group.id", "my-share-group")
.option("subscribe", "topic1")
.load()New configuration options:
- `kafka.share.group.id` (required)
- `kafka.share.acknowledgment.mode` (implicit/explicit)
- `kafka.share.exactly.once.strategy`
(none/idempotent/two-phase-commit/checkpoint-dedup)
### How was this patch tested?
- Unit tests for `ShareStateBatch`, `KafkaShareSourceOffset`,
`ShareInFlightRecord`
- Integration tests for fault tolerance and recovery scenarios
- Tests for checkpoint-based deduplication and two-phase commit coordinator
### Was this patch authored or co-authored using generative AI tooling?
Cursor IDE
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]