Thanks for the useful comment. But I guess this setting applies only when I use SparkSQL right= is there any similar settings for Spark?
best, /Shahab On Tue, Oct 28, 2014 at 2:38 PM, Wanda Hawk <[email protected]> wrote: > Is this what are you looking for ? > > In Shark, default reducer number is 1 and is controlled by the property > mapred.reduce.tasks. Spark SQL deprecates this property in favor of > spark.sql.shuffle.partitions, whose default value is 200. Users may > customize this property via SET: > > SET spark.sql.shuffle.partitions=10; > SELECT page, count(*) c > FROM logs_last_month_cached > GROUP BY page ORDER BY c DESC LIMIT 10; > > > Spark SQL Programming Guide - Spark 1.1.0 Documentation > <http://spark.apache.org/docs/latest/sql-programming-guide.html> > > > > > > > Spark SQL Programming Guide - Spark 1.1.0 Documentation > <http://spark.apache.org/docs/latest/sql-programming-guide.html> > Spark SQL Programming Guide Overview Getting Started Data Sources RDDs > Inferring the Schema Using Reflection Programmatically Specifying the > Schema Parquet Files Loading Data Programmatically > View on spark.apache.org > <http://spark.apache.org/docs/latest/sql-programming-guide.html> > Preview by Yahoo > > > ------------------------------ > *From:* shahab <[email protected]> > *To:* [email protected] > *Sent:* Tuesday, October 28, 2014 3:20 PM > *Subject:* How can number of partitions be set in "spark-env.sh"? > > I am running a stand alone Spark cluster, 2 workers each has 2 cores. > Apparently, I am loading and processing relatively large chunk of data so > that I receive task failure " " . As I read from some posts and > discussions in the mailing list the failures could be related to the large > size of processing data in the partitions and if I have understood > correctly I should have smaller partitions (but many of them) ?! > > Is there any way that I can set the number of partitions dynamically in > "spark-env.sh" or in the submiited Spark application? > > > best, > /Shahab > > >
