Github user aarondav commented on a diff in the pull request:
https://github.com/apache/spark/pull/116#discussion_r10675622
--- Diff: bin/spark-shell ---
@@ -30,69 +30,378 @@ esac
# Enter posix mode for bash
set -o posix
-CORE_PATTERN="^[0-9]+$"
-MEM_PATTERN="^[0-9]+[m|g|M|G]$"
-
+## Global script variables
FWDIR="$(cd `dirname $0`/..; pwd)"
-if [ "$1" = "--help" ] || [ "$1" = "-h" ]; then
- echo "Usage: spark-shell [OPTIONS]"
- echo "OPTIONS:"
- echo "-c --cores num, the maximum number of cores to be used by the
spark shell"
- echo "-em --execmem num[m|g], the memory used by each executor of spark
shell"
- echo "-dm --drivermem num[m|g], the memory used by the spark shell and
driver"
- echo "-h --help, print this help information"
- exit
-fi
+VERBOSE=0
+DRY_RUN=0
+SPARK_REPL_OPTS="${SPARK_REPL_OPTS:-""}"
+MASTER=""
+
+#CLI Color Templates
+txtund=$(tput sgr 0 1) # Underline
+txtbld=$(tput bold) # Bold
+bldred=${txtbld}$(tput setaf 1) # red
+bldyel=${txtbld}$(tput setaf 3) # yellow
+bldblu=${txtbld}$(tput setaf 4) # blue
+bldwht=${txtbld}$(tput setaf 7) # white
+txtrst=$(tput sgr0) # Reset
+info=${bldwht}*${txtrst} # Feedback
+pass=${bldblu}*${txtrst}
+warn=${bldred}*${txtrst}
+ques=${bldblu}?${txtrst}
+
+# Helper function to describe the script usage
+function usage() {
+ cat << EOF
+
+${txtbld}Usage${txtrst}: spark-shell [OPTIONS]
+
+${txtbld}OPTIONS${txtrst}:
+
+${txtund}Basic${txtrst}:
+
+ -h --help : Print this help information.
+ -c --executor-cores : The maximum number of cores to be used by
the Spark Shell.
+ -em --executor-memory : The memory used by each executor of the
Spark Shell, the number
+ is followed by m for megabytes or g for
gigabytes, e.g. "1g".
+ -dm --driver-memory : The memory used by the Spark Shell, the
number is followed
+ by m for megabytes or g for gigabytes, e.g.
"1g".
+
+${txtund}Soon to be deprecated${txtrst}:
+
+ --cores : please use -c/--executor-cores
+
+${txtund}Other options${txtrst}:
+
+ -mip --master-ip : The Spark Master ip/hostname.
+ -mp --master-port : The Spark Master port.
+ -m --master : A full string that describes the Spark Master,
e.g. "local" or "spark://localhost:7077".
+ -ld --local-dir : The absolute path to a local directory that
will be use for "scratch" space in Spark.
--- End diff --
I think we should avoid replicating too many configuration options,
especially those that can be configured as part of the Spark properties, as
those should probably be configured in spark-env.sh for use on your entire
cluster. It is definitely worthwhile to be able to configure environment
variables, particularly those that are used only be the shell and nothing else,
but I think there are a few properties which we don't need to have command-line
options for:
- local-dir
- locality-wait
- schedule-fair
- max-failures
--mesos-coarse (this is only useful for the mesos scheduler and is probably
a pretty serious decision one should make, rather than just being a transient
option per shell)
Please feel free to fight back if you disagree on some of these options --
I just feel they don't need to be configured sufficiently often to be a
command-line option, or that they have implications beyond a single shell
session.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---