Github user berngp commented on a diff in the pull request:
https://github.com/apache/spark/pull/116#discussion_r10684068
--- Diff: bin/spark-shell ---
@@ -30,69 +30,378 @@ esac
# Enter posix mode for bash
set -o posix
-CORE_PATTERN="^[0-9]+$"
-MEM_PATTERN="^[0-9]+[m|g|M|G]$"
-
+## Global script variables
FWDIR="$(cd `dirname $0`/..; pwd)"
-if [ "$1" = "--help" ] || [ "$1" = "-h" ]; then
- echo "Usage: spark-shell [OPTIONS]"
- echo "OPTIONS:"
- echo "-c --cores num, the maximum number of cores to be used by the
spark shell"
- echo "-em --execmem num[m|g], the memory used by each executor of spark
shell"
- echo "-dm --drivermem num[m|g], the memory used by the spark shell and
driver"
- echo "-h --help, print this help information"
- exit
-fi
+VERBOSE=0
+DRY_RUN=0
+SPARK_REPL_OPTS="${SPARK_REPL_OPTS:-""}"
+MASTER=""
+
+#CLI Color Templates
+txtund=$(tput sgr 0 1) # Underline
+txtbld=$(tput bold) # Bold
+bldred=${txtbld}$(tput setaf 1) # red
+bldyel=${txtbld}$(tput setaf 3) # yellow
+bldblu=${txtbld}$(tput setaf 4) # blue
+bldwht=${txtbld}$(tput setaf 7) # white
+txtrst=$(tput sgr0) # Reset
+info=${bldwht}*${txtrst} # Feedback
+pass=${bldblu}*${txtrst}
+warn=${bldred}*${txtrst}
+ques=${bldblu}?${txtrst}
+
+# Helper function to describe the script usage
+function usage() {
+ cat << EOF
+
+${txtbld}Usage${txtrst}: spark-shell [OPTIONS]
+
+${txtbld}OPTIONS${txtrst}:
+
+${txtund}Basic${txtrst}:
+
+ -h --help : Print this help information.
+ -c --executor-cores : The maximum number of cores to be used by
the Spark Shell.
+ -em --executor-memory : The memory used by each executor of the
Spark Shell, the number
+ is followed by m for megabytes or g for
gigabytes, e.g. "1g".
+ -dm --driver-memory : The memory used by the Spark Shell, the
number is followed
+ by m for megabytes or g for gigabytes, e.g.
"1g".
+
+${txtund}Soon to be deprecated${txtrst}:
+
+ --cores : please use -c/--executor-cores
+
+${txtund}Other options${txtrst}:
+
+ -mip --master-ip : The Spark Master ip/hostname.
+ -mp --master-port : The Spark Master port.
+ -m --master : A full string that describes the Spark Master,
e.g. "local" or "spark://localhost:7077".
+ -ld --local-dir : The absolute path to a local directory that
will be use for "scratch" space in Spark.
--- End diff --
I forgot to actually set a context for this change so here.
Currently I am working on understanding the impact of different workloads
using different configuration setups. The Spark Shell is a pretty convenient
tool to do so in a Development/Lab environment.
I added those flags to _"quickly"_ understand and assert the behavior of
the cluster with a specific workload using different configurations. In most
cases _"normal"_ users shouldn't fiddle around the settings, e.g. change from
`mesos-fine` to `mesos-coarse`, unless they want to test such effects of the
workloads.
In the end, I see the Spark Shell as a REPL that should aid on the
development of Spark Applications. In my mind it should be flexible and allow
users to play around with different setups so they can hone in on the setup
their Spark Applications should have. I might have gone overboard on setting
the local-dir as a shell option but in my case I am deployed in EC2 and want to
understand the impact of using a normal EBS vs a HIOPs EBS vs local storage
etc. Same goes with other optional arguments added.
Please don't forget that pull-requests that are "enhancements" are
basically suggestions. I can remove those arguments that do not make sense for
others or place a -X prefix, similar to what the JVM has and hide them from the
normal help message.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---