That would work - I normally use hive queries through spark sql, I
have not seen something like that there.

On Thu, Oct 2, 2014 at 3:13 PM, Ashish Jain <ashish....@gmail.com> wrote:
> If you are using textFiles() to read data in, it also takes in a parameter
> the number of minimum partitions to create. Would that not work for you?
>
> On Oct 2, 2014 7:00 AM, "jamborta" <jambo...@gmail.com> wrote:
>>
>> Hi all,
>>
>> I have been testing repartitioning to ensure that my algorithms get
>> similar
>> amount of data.
>>
>> Noticed that repartitioning is very expensive. Is there a way to force
>> Spark
>> to create a certain number of partitions when the data is read in? How
>> does
>> it decided on the partition size initially?
>>
>> Thanks,
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/partition-size-for-initial-read-tp15603.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to