gaborgsomogyi commented on a change in pull request #23929:
[SPARK-27022][DSTREAMS] Add kafka delegation token support.
URL: https://github.com/apache/spark/pull/23929#discussion_r262424875
##########
File path: docs/streaming-kafka-0-10-integration.md
##########
@@ -277,9 +277,79 @@ stream.foreachRDD(rdd -> {
</div>
</div>
-### SSL / TLS
-The new Kafka consumer [supports
SSL](http://kafka.apache.org/documentation.html#security_ssl). To enable it,
set kafkaParams appropriately before passing to `createDirectStream` /
`createRDD`. Note that this only applies to communication between Spark and
Kafka brokers; you are still responsible for separately
[securing](security.html) Spark inter-node communication.
+### Deploying
+
+As with any Spark applications, `spark-submit` is used to launch your
application.
+
+For Scala and Java applications, if you are using SBT or Maven for project
management, then package
`spark-streaming-kafka-0-10_{{site.SCALA_BINARY_VERSION}}` and its dependencies
into the application JAR. Make sure `spark-core_{{site.SCALA_BINARY_VERSION}}`
and `spark-streaming_{{site.SCALA_BINARY_VERSION}}` are marked as `provided`
dependencies as those are already present in a Spark installation. Then use
`spark-submit` to launch your application (see [Deploying
section](streaming-programming-guide.html#deploying-applications) in the main
programming guide).
+
+### Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a
cluster. For detailed
+description about these possibilities, see [Kafka security
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**
+- **JAAS login configuration**
+
+#### Delegation token
+
+This way the application can be configured via Spark parameters and may not
need JAAS login
+configuration (Spark can use Kafka's dynamic JAAS configuration feature). For
further information
+about delegation tokens, see [Kafka delegation token
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+The process is initiated by Spark's Kafka delegation token provider. When
`spark.kafka.bootstrap.servers` is set,
+Spark considers the following log in options, in order of preference:
+- **JAAS login configuration**, please see example below.
+- **Keytab file**, such as,
+
+ ./bin/spark-submit \
+ --keytab <KEYTAB_FILE> \
+ --principal <PRINCIPAL> \
+ --conf "spark.kafka.bootstrap.servers=<KAFKA_SERVERS>" \
+ ...
+
+- **Kerberos credential cache**, such as,
+
+ ./bin/spark-submit \
+ --conf "spark.kafka.bootstrap.servers=<KAFKA_SERVERS>" \
+ ...
+
+The Kafka delegation token provider can be turned off by setting
`spark.security.credentials.kafka.enabled` to `false` (default: `true`).
+Spark can be configured to use the following authentication protocols to
obtain token (it must match with
+Kafka broker configuration):
+- **SASL SSL (default)**
+- **SSL**
+- **SASL PLAINTEXT (for testing)**
+
+After obtaining delegation token successfully, Spark distributes it across
nodes and renews it accordingly.
+Delegation token uses `SCRAM` login module for authentication and because of
that the appropriate
+`spark.kafka.sasl.token.mechanism` (default: `SCRAM-SHA-512`) has to be
configured. Also, this parameter
+must match with Kafka broker configuration.
+
+When delegation token is available on an executor it can be overridden with
JAAS login configuration.
+
+##### Caveats
+
+- Obtaining delegation token for proxy user is not yet supported
([KAFKA-6945](https://issues.apache.org/jira/browse/KAFKA-6945)).
Review comment:
I'm planning to add the first caveat for Structured Streaming as well...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]