Repository: spark-website Updated Branches: refs/heads/asf-site e95223137 -> 03485ecc8
Fix FAQ typo - Remove unnecessary occurrence of 'are' Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/03485ecc Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/03485ecc Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/03485ecc Branch: refs/heads/asf-site Commit: 03485ecc8268b06b7b6fc274be3f674320387924 Parents: e952231 Author: Aayush Sarva <checkaay...@gmail.com> Authored: Wed Jan 11 15:48:07 2017 +0530 Committer: Sean Owen <so...@cloudera.com> Committed: Fri Jan 13 12:57:19 2017 +0000 ---------------------------------------------------------------------- faq.md | 2 +- site/faq.html | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark-website/blob/03485ecc/faq.md ---------------------------------------------------------------------- diff --git a/faq.md b/faq.md index 7b2fa15..694c263 100644 --- a/faq.md +++ b/faq.md @@ -19,7 +19,7 @@ Spark is a fast and general processing engine compatible with Hadoop data. It ca <p class="question">How large a cluster can Spark scale to?</p> -<p class="answer">Many organizations run Spark on clusters of thousands of nodes. The largest cluster we are know has 8000. In terms of data size, Spark has been shown to work well up to petabytes. It has been used to sort 100 TB of data 3X faster than Hadoop MapReduce on 1/10th of the machines, <a href="http://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html">winning the 2014 Daytona GraySort Benchmark</a>, as well as to <a href="https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html">sort 1 PB</a>. Several production workloads <a href="http://databricks.com/blog/2014/08/14/mining-graph-data-with-spark-at-alibaba-taobao.html">use Spark to do ETL and data analysis on PBs of data</a>.</p> +<p class="answer">Many organizations run Spark on clusters of thousands of nodes. The largest cluster we know has 8000 of them. In terms of data size, Spark has been shown to work well up to petabytes. It has been used to sort 100 TB of data 3X faster than Hadoop MapReduce on 1/10th of the machines, <a href="http://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html">winning the 2014 Daytona GraySort Benchmark</a>, as well as to <a href="https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html">sort 1 PB</a>. Several production workloads <a href="http://databricks.com/blog/2014/08/14/mining-graph-data-with-spark-at-alibaba-taobao.html">use Spark to do ETL and data analysis on PBs of data</a>.</p> <p class="question">Does my data need to fit in memory to use Spark?</p> http://git-wip-us.apache.org/repos/asf/spark-website/blob/03485ecc/site/faq.html ---------------------------------------------------------------------- diff --git a/site/faq.html b/site/faq.html index ed17df0..807803b 100644 --- a/site/faq.html +++ b/site/faq.html @@ -204,7 +204,7 @@ Spark is a fast and general processing engine compatible with Hadoop data. It ca <p class="answer">As of 2016, surveys show that more than 1000 organizations are using Spark in production. Some of them are listed on the <a href="/powered-by.html">Powered By page</a> and at the <a href="http://spark-summit.org">Spark Summit</a>.</p> <p class="question">How large a cluster can Spark scale to?</p> -<p class="answer">Many organizations run Spark on clusters of thousands of nodes. The largest cluster we are know has 8000. In terms of data size, Spark has been shown to work well up to petabytes. It has been used to sort 100 TB of data 3X faster than Hadoop MapReduce on 1/10th of the machines, <a href="http://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html">winning the 2014 Daytona GraySort Benchmark</a>, as well as to <a href="https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html">sort 1 PB</a>. Several production workloads <a href="http://databricks.com/blog/2014/08/14/mining-graph-data-with-spark-at-alibaba-taobao.html">use Spark to do ETL and data analysis on PBs of data</a>.</p> +<p class="answer">Many organizations run Spark on clusters of thousands of nodes. The largest cluster we know has 8000 of them. In terms of data size, Spark has been shown to work well up to petabytes. It has been used to sort 100 TB of data 3X faster than Hadoop MapReduce on 1/10th of the machines, <a href="http://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html">winning the 2014 Daytona GraySort Benchmark</a>, as well as to <a href="https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html">sort 1 PB</a>. Several production workloads <a href="http://databricks.com/blog/2014/08/14/mining-graph-data-with-spark-at-alibaba-taobao.html">use Spark to do ETL and data analysis on PBs of data</a>.</p> <p class="question">Does my data need to fit in memory to use Spark?</p> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org