[GitHub] spark issue #18230: [SPARK-19688] [STREAMING] Not to read `spark.yarn.creden...

saturday-shi Sun, 18 Jun 2017 23:55:38 -0700

Github user saturday-shi commented on the issue:

https://github.com/apache/spark/pull/18230

@jerryshao
> "reload" here meanings retrieving back SparkConf from checkpoint file and
using this retrieved SparkConf to create SparkContext when restarting streaming
application.

That explanation is extremely wrong. But your opinion of what the
`propertiesToReload` list does is right.

After restarting from checkpoint, properties in `SparkConf` will be the
same as the previous application. But something like `spark.yarn.app.id` will
be stale an useless in a restarted app. So after retrieving back the
`SparkConf` from checkpoint, we want to "reload" the fresh values from system
properties, instead of using old ones in the checkpoint.

@vanzin
> So if you start the second streaming application without providing
principal / keytab, Client.scala will not overwrite the credential file path,
but still the AM will start the credential updater, because the file location
is in the configuration read from the checkpoint.

That's probably right, but not the case. I do submit the principal & keytab
at restarting and the AM do renew the token using the principal successfully.

I noticed that the `SparkConf` used by `AMCredentialRenewer` and
`CredentialUpdater` seems NOT THE SAME. The credential renewer thread launched
by the AM will work correctly, but the credential updater in executor backend -
which uses configs provided by the diver - will confuse and fail in its job. So
just fixing the AM code doesn't make much sense.

FYI, the log of `AMCredentialRenewer` looks like this:
```
17/06/07 15:11:14 INFO security.AMCredentialRenewer: Scheduling login from
keytab in 96952 millis.
...
17/06/07 15:12:51 INFO security.AMCredentialRenewer: Attempting to login to
KDC using principal: [email protected]
17/06/07 15:12:51 INFO security.AMCredentialRenewer: Successfully logged
into KDC.
...
17/06/07 15:12:53 INFO security.AMCredentialRenewer: Writing out delegation
tokens to
hdfs://nameservice1/user/xxx/.sparkStaging/application_1496384469444_0036/credentials-044b83ea-b46b-4bd4-8e98-0e38928fd58c-1496816091985-1.tmp
17/06/07 15:12:53 INFO security.AMCredentialRenewer: Delegation Tokens
written out successfully. Renaming file to
hdfs://nameservice1/user/xxx/.sparkStaging/application_1496384469444_0036/credentials-044b83ea-b46b-4bd4-8e98-0e38928fd58c-1496816091985-1
17/06/07 15:12:53 INFO security.AMCredentialRenewer: Delegation token file
rename complete.
17/06/07 15:12:53 INFO security.AMCredentialRenewer: Scheduling login from
keytab in 110925 millis.
...
```
It renews the token successfully and saves it to
application_1496384469444_0036's dir.
But the `CredentialUpdater` (started by `YarnSparkHadoopUtil`) complains
about this:
```
17/06/07 15:11:24 INFO executor.CoarseGrainedExecutorBackend: Will
periodically update credentials from:
hdfs://nameservice1/user/xxx/.sparkStaging/application_1496384469444_0035/credentials-19a7c11e-8c93-478c-ab0a-cdbfae5b2ae5
...
17/06/07 15:12:24 WARN yarn.YarnSparkHadoopUtil: Error while attempting to
list files from application staging dir
java.io.FileNotFoundException: File
hdfs://nameservice1/user/xxx/.sparkStaging/application_1496384469444_0035 does
not exist.
...
```
... which says that the credentials file doesn't exist in
application_1496384469444_0035's dir.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #18230: [SPARK-19688] [STREAMING] Not to read `spark.yarn.creden...

Reply via email to