Re: Increase partitions reading Parquet File

Michael Armbrust Tue, 14 Apr 2015 11:30:19 -0700

RDDs are immutable.  Running .repartition does not change the RDD, but
instead returns *a new RDD *with more partitions.


On Tue, Apr 14, 2015 at 3:59 AM, Masf <masfwo...@gmail.com> wrote:

> Hi.
>
> It doesn't work.
>
> val file = SqlContext.parquetfile("hdfs://node1/user/hive/warehouse/
> file.parquet")
> file.repartition(127)
>
> println(h.partitions.size.toString()) <------ Return 27!!!!!
>
> Regards
>
>
> On Fri, Apr 10, 2015 at 4:50 PM, Felix C <felixcheun...@hotmail.com>
> wrote:
>
>>  RDD.repartition(1000)?
>>
>> --- Original Message ---
>>
>> From: "Masf" <masfwo...@gmail.com>
>> Sent: April 9, 2015 11:45 PM
>> To: user@spark.apache.org
>> Subject: Increase partitions reading Parquet File
>>
>>  Hi
>>
>>  I have this statement:
>>
>>  val file =
>> SqlContext.parquetfile("hdfs://node1/user/hive/warehouse/file.parquet")
>>
>>  This code generates as many partitions as files are. So, I want to
>> increase the number of partitions.
>> I've tested coalesce (file.coalesce(100)) but the number of partitions
>> doesn't change.
>>
>>  How can I increase the number of partitions?
>>
>>  Thanks
>>
>>  --
>>
>>
>> Regards.
>> Miguel Ángel
>>
>
>
>
> --
>
>
> Saludos.
> Miguel Ángel
>

Re: Increase partitions reading Parquet File

Reply via email to