Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/2964#discussion_r19707119
--- Diff: docs/configuration.md ---
@@ -21,16 +21,22 @@ application. These properties can be set directly on a
[SparkConf](api/scala/index.html#org.apache.spark.SparkConf) passed to your
`SparkContext`. `SparkConf` allows you to configure some of the common
properties
(e.g. master URL and application name), as well as arbitrary key-value
pairs through the
-`set()` method. For example, we could initialize an application as follows:
+`set()` method. For example, we could initialize an application with one
worker as follows:
+
+Note that we run with local[2], meaning two threads - which represents
"minimal" parallelism,
+which can help detect bugs that only exist when we run in a distributed
context.
{% highlight scala %}
val conf = new SparkConf()
- .setMaster("local")
+ .setMaster("local[2]")
.setAppName("CountingSheep")
.set("spark.executor.memory", "1g")
val sc = new SparkContext(conf)
{% endhighlight %}
+Note that we can have more than 1 worker in local mode, and in cases like
spark streaming, we may actually
+require one to prevent any sort of starvation issues.
--- End diff --
One note on this, the threads shouldn't be called "workers", since that
means something else in our distributed cluster mode. It's better to call them
threads here.
Also, on this like, capitalize Spark Streaming.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]