[GitHub] spark pull request: default task number misleading in several plac...

rxin Wed, 14 May 2014 12:49:52 -0700

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/766#discussion_r12655504
  
    --- Diff: docs/streaming-programming-guide.md ---
    @@ -956,9 +957,10 @@ before further processing.
     ### Level of Parallelism in Data Processing
     Cluster resources maybe under-utilized if the number of parallel tasks 
used in any stage of the
     computation is not high enough. For example, for distributed reduce 
operations like `reduceByKey`
    -and `reduceByKeyAndWindow`, the default number of parallel tasks is 8. You 
can pass the level of
    -parallelism as an argument (see the
    
-[`PairDStreamFunctions`](api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions)
    +and `reduceByKeyAndWindow`, the default number of parallel tasks is 
decided by the [config property]
    +(configuration.html#spark-properties) `spark.default.parallelism`. You can 
pass the level of
    +parallelism as an argument (see the [`PairDStreamFunctions`]
    --- End diff --
    
    remove the "the" before [`PairDStreamFunctions`]



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: default task number misleading in several plac...

Reply via email to