yaooqinn opened a new pull request, #46802:
URL: https://github.com/apache/spark/pull/46802
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://spark.apache.org/contributing.html
2. Ensure you have added or run the appropriate tests for your PR:
https://spark.apache.org/developer-tools.html
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g.,
'[WIP][SPARK-XXXX] Your PR title ...'.
4. Be sure to keep the PR description updated to reflect all changes.
5. Please write your PR title to summarize what this PR proposes.
6. If possible, provide a concise example to reproduce the issue for a
faster review.
7. If you want to add a new configuration, please read the guideline first
for naming configurations in
'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
8. If you want to add or modify an error type or message, please read the
guideline first in
'common/utils/src/main/resources/error/README.md'.
-->
### What changes were proposed in this pull request?
<!--
Please clarify what changes you are proposing. The purpose of this section
is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR. See the examples below.
1. If you refactor some codes with changing classes, showing the class
hierarchy will help reviewers.
2. If you fix some SQL features, you can provide some references of other
DBMSes.
3. If there is design documentation, please add the link.
4. If there is a discussion in the mailing list, please add the link.
-->
In this PR, we improve documentation and usage guide for the history server
by:
- Identify and print **unrecognized options** specified by users
- Obtain and print all history server-related configurations dynamically
instead of using an incomplete, outdated hardcoded list.
- Ensure all configurations are documented for the usage guide
### Why are the changes needed?
<!--
Please clarify why the changes are needed. For instance,
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, you can clarify why it is a bug.
-->
- Revise the help guide for the history server to make it more
user-friendly. Missing configuration in the help guide is not always reachable
in our official documentation. E.g. spark.history.fs.safemodeCheck.interval is
still missing from the doc since added in 1.6.
- Missusage shall be reported to users
### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such as
the documentation fix.
If yes, please clarify the previous behavior and the change this PR proposes
- provide the console output, description and/or an example to show the
behavior difference if possible.
If possible, please also clarify if this is a user-facing change compared to
the released Spark versions or within the unreleased branches such as master.
If no, write 'No'.
-->
No, the print style is still AS-IS with items increased
### How was this patch tested?
<!--
If tests were added, say they were added here. Please make sure to add some
test cases that check the changes thoroughly including negative and positive
cases if possible.
If it was tested in a way different from regular unit tests, please clarify
how you tested step by step, ideally copy and paste-able, so that other
reviewers can test and check, and descendants can verify in the future.
If tests were not added, please describe why they were not added and/or why
it was difficult to add.
If benchmark tests were added, please run the benchmarks in GitHub Actions
for the consistent environment, and the instructions could accord to:
https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
-->
#### without this pr
```
Usage: ./sbin/start-history-server.sh [options]
24/05/30 15:37:23 INFO SignalUtils: Registering signal handler for TERM
24/05/30 15:37:23 INFO SignalUtils: Registering signal handler for HUP
24/05/30 15:37:23 INFO SignalUtils: Registering signal handler for INT
Options:
--properties-file FILE Path to a custom Spark properties file.
Default is conf/spark-defaults.conf.
Configuration options can be set by setting the corresponding JVM system
property.
History Server options are always available; additional options depend on
the provider.
History Server options:
spark.history.ui.port Port where server will listen for
connections
(default 18080)
spark.history.acls.enable Whether to enable view acls for all
applications
(default false)
spark.history.provider Name of history provider class
(defaults to
file system-based provider)
spark.history.retainedApplications Max number of application UIs to keep
loaded in memory
(default 50)
FsHistoryProvider options:
spark.history.fs.logDirectory Directory where app logs are stored
(default: file:/tmp/spark-events)
spark.history.fs.update.interval How often to reload log data from
storage
(in seconds, default: 10)
```
#### For error
```java
Unrecognized options: --conf spark.history.ui.port=10000
Usage: HistoryServer [options]
Options:
--properties-file FILE Path to a
custom Spark properties file.
Default
is conf/spark-defaults.conf.
```
#### For help
```java
sbin/start-history-server.sh --help
Usage: ./sbin/start-history-server.sh [options]
{"ts":"2024-05-30T07:15:29.740Z","level":"INFO","msg":"Registering signal
handler for TERM","context":{"signal":"TERM"},"logger":"SignalUtils"}
{"ts":"2024-05-30T07:15:29.741Z","level":"INFO","msg":"Registering signal
handler for HUP","context":{"signal":"HUP"},"logger":"SignalUtils"}
{"ts":"2024-05-30T07:15:29.741Z","level":"INFO","msg":"Registering signal
handler for INT","context":{"signal":"INT"},"logger":"SignalUtils"}
Options:
--properties-file FILE Path to a
custom Spark properties file.
Default
is conf/spark-defaults.conf.
Configuration options can be set by setting the corresponding JVM system
property.
History Server options are always available; additional options depend on
the provider.
History Server options:
spark.history.custom.executor.log.url Specifies
custom spark executor log url for supporting
external
log service instead of using cluster managers'
application log urls in the history server. Spark will
support
some path variables via patterns which can vary on
cluster
manager. Please check the documentation for your
cluster
manager to see which patterns are supported, if any.
This
configuration has no effect on a live application, it
only
affects the history server.
(Default:
<undefined>)
spark.history.custom.executor.log.url.applyIncompleteApplication Whether
to apply custom executor log url, as specified by
spark.history.custom.executor.log.url, to incomplete
application as well. Even if this is true, this still only
affects
the behavior of the history server, not running
spark
applications.
(Default:
true)
spark.history.kerberos.enabled Indicates
whether the history server should use kerberos to
login.
This is required if the history server is accessing
HDFS
files on a secure Hadoop cluster.
(Default:
false)
spark.history.kerberos.keytab When
spark.history.kerberos.enabled=true, specifies location
of the
kerberos keytab file for the History Server.
(Default:
<undefined>)
spark.history.kerberos.principal When
spark.history.kerberos.enabled=true, specifies kerberos
principal
name for the History Server.
(Default:
<undefined>)
spark.history.provider Name of
the class implementing the application history
backend.
(Default:
org.apache.spark.deploy.history.FsHistoryProvider)
spark.history.retainedApplications The
number of applications to retain UI data for in the
cache. If
this cap is exceeded, then the oldest applications
will be
removed from the cache. If an application is not in
the
cache, it will have to be loaded from disk if it is
accessed
from the UI.
(Default:
50)
spark.history.store.hybridStore.diskBackend Specifies
a disk-based store used in hybrid store; ROCKSDB
or
LEVELDB (deprecated).
(Default:
ROCKSDB)
spark.history.store.hybridStore.enabled Whether
to use HybridStore as the store when parsing event
logs.
HybridStore will first write data to an in-memory
store and
having a background thread that dumps data to a
disk
store after the writing to in-memory store is
completed.
(Default:
false)
spark.history.store.hybridStore.maxMemoryUsage Maximum
memory space that can be used to create HybridStore.
The
HybridStore co-uses the heap memory, so the heap memory
should be
increased through the memory option for SHS if the
HybridStore is enabled.
(Default:
2g)
spark.history.store.maxDiskUsage Maximum
disk usage for the local directory where the cache
application history information are stored.
(Default:
10g)
spark.history.store.path Local
directory where to cache application history
information. By default this is not set, meaning all history
information will be kept in memory.
(Default:
<undefined>)
spark.history.store.serializer
Serializer for writing/reading in-memory UI objects to/from
disk-based KV Store; JSON or PROTOBUF. JSON serializer is
the only
choice before Spark 3.4.0, thus it is the default
value.
PROTOBUF serializer is fast and compact, and it is
the
default serializer for disk-based KV store of live UI.
(Default:
JSON)
spark.history.ui.acls.enable Specifies
whether ACLs should be checked to authorize users
viewing
the applications in the history server. If enabled,
access
control checks are performed regardless of what the
individual applications had set for spark.ui.acls.enable.
The
application owner will always have authorization to view
their own
application and any users specified via
spark.ui.view.acls and groups specified via
spark.ui.view.acls.groups when the application was run will
also have
authorization to view that application. If
disabled,
no access control checks are made for any
application UIs available through the history server.
(Default:
false)
spark.history.ui.admin.acls Comma
separated list of users that have view access to all
the Spark
applications in history server.
(Default:
)
spark.history.ui.admin.acls.groups Comma
separated list of groups that have view access to all
the Spark
applications in history server.
(Default:
)
spark.history.ui.port Web UI
port to bind Spark History Server
(Default:
18080)
FsHistoryProvider options:
spark.history.fs.cleaner.enabled Whether
the History Server should periodically clean up
event
logs from storage
(Default:
false)
spark.history.fs.cleaner.interval When
spark.history.fs.cleaner.enabled=true, specifies how
often the
filesystem job history cleaner checks for files to
delete.
(Default:
1d)
spark.history.fs.cleaner.maxAge When
spark.history.fs.cleaner.enabled=true, history files
older
than this will be deleted when the filesystem history
cleaner
runs.
(Default:
7d)
```
### Was this patch authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
patch, please include the
phrase: 'Generated-by: ' followed by the name of the tool and its version.
If no, write 'No'.
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
-->
no
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]