Re: How can number of partitions be set in "spark-env.sh"?

shahab Tue, 28 Oct 2014 06:58:52 -0700

Thanks for the useful comment. But I guess this setting applies only when I
use SparkSQL  right=  is there any similar settings for Spark?


best,
/Shahab

On Tue, Oct 28, 2014 at 2:38 PM, Wanda Hawk <[email protected]> wrote:

> Is this what are you looking for ?
>
> In Shark, default reducer number is 1 and is controlled by the property
> mapred.reduce.tasks. Spark SQL deprecates this property in favor of
> spark.sql.shuffle.partitions, whose default value is 200. Users may
> customize this property via SET:
>
> SET spark.sql.shuffle.partitions=10;
> SELECT page, count(*) c
> FROM logs_last_month_cached
> GROUP BY page ORDER BY c DESC LIMIT 10;
>
>
> Spark SQL Programming Guide - Spark 1.1.0 Documentation
> <http://spark.apache.org/docs/latest/sql-programming-guide.html>
>
>
>
>
>
>
> Spark SQL Programming Guide - Spark 1.1.0 Documentation
> <http://spark.apache.org/docs/latest/sql-programming-guide.html>
> Spark SQL Programming Guide Overview Getting Started Data Sources RDDs
> Inferring the Schema Using Reflection Programmatically Specifying the
> Schema Parquet Files Loading Data Programmatically
> View on spark.apache.org
> <http://spark.apache.org/docs/latest/sql-programming-guide.html>
> Preview by Yahoo
>
>
>   ------------------------------
>  *From:* shahab <[email protected]>
> *To:* [email protected]
> *Sent:* Tuesday, October 28, 2014 3:20 PM
> *Subject:* How can number of partitions be set in "spark-env.sh"?
>
> I am running a stand alone Spark cluster, 2 workers each has 2 cores.
> Apparently, I am loading and processing relatively large chunk of data so
> that I receive task failure " " .  As I read from some posts and
> discussions in the mailing list the failures could be related to the large
> size of processing data in the partitions and if I have understood
> correctly I should have smaller partitions (but many of them) ?!
>
> Is there any way that I can set the number of partitions dynamically in
> "spark-env.sh" or in the submiited Spark application?
>
>
> best,
> /Shahab
>
>
>

Re: How can number of partitions be set in "spark-env.sh"?

Reply via email to