Re: Spark Partition by Columns doesn't work properly

Jasleen Kaur Wed, 08 Jun 2016 22:52:14 -0700

The github repo is https://github.com/datastax/spark-cassandra-connector


The talk video and slides should be uploaded soon on spark summit website

On Wednesday, June 8, 2016, Chanh Le <giaosu...@gmail.com> wrote:

> Thanks, I'll look into it. Any luck to get link related to.
>
> On Thu, Jun 9, 2016, 12:43 PM Jasleen Kaur <jasleenkaur1...@gmail.com
> <javascript:_e(%7B%7D,'cvml','jasleenkaur1...@gmail.com');>> wrote:
>
>> Try using the datastax package. There was a great talk on spark summit
>> about it. It will take care of the boiler plate code and you can focus on
>> real business value
>>
>> On Wednesday, June 8, 2016, Chanh Le <giaosu...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','giaosu...@gmail.com');>> wrote:
>>
>>> Hi everyone,
>>> I tested the partition by columns of data frame but it’s not good I mean
>>> wrong.
>>> I am using Spark 1.6.1 load data from Cassandra.
>>> I repartition by 2 field date, network_id - 200 partitions
>>> I reparation by 1 field date - 200 partitions.
>>> but my data is data of 90 days -> I mean if we reparation by date it
>>> will be 90 partitions.
>>>
>>> val daily = sql
>>>   .read
>>>   .format("org.apache.spark.sql.cassandra")
>>>   .options(Map("table" -> dailyDetailTableName, "keyspace" -> reportSpace))
>>>   .load()
>>>   .repartition(col("date"))
>>>
>>>
>>>
>>> I mean It doesn’t change the way I put the columns to repartition.
>>>
>>> Does anyone has the same problem?
>>>
>>> Thank in advance.
>>>
>>

Re: Spark Partition by Columns doesn't work properly

Reply via email to