That would work - I normally use hive queries through spark sql, I have not seen something like that there.
On Thu, Oct 2, 2014 at 3:13 PM, Ashish Jain <ashish....@gmail.com> wrote: > If you are using textFiles() to read data in, it also takes in a parameter > the number of minimum partitions to create. Would that not work for you? > > On Oct 2, 2014 7:00 AM, "jamborta" <jambo...@gmail.com> wrote: >> >> Hi all, >> >> I have been testing repartitioning to ensure that my algorithms get >> similar >> amount of data. >> >> Noticed that repartitioning is very expensive. Is there a way to force >> Spark >> to create a certain number of partitions when the data is read in? How >> does >> it decided on the partition size initially? >> >> Thanks, >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/partition-size-for-initial-read-tp15603.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org