repartition() puts all values with the same key in one partition, but,
multiple other keys can be in the same partition. It sounds like you want
groupBy, not repartition, if you want to handle these separately.
On Mon, Jun 20, 2022 at 8:26 AM DESCOTTE Loic - externe
wrote:
> Hi,
>
>
>
> I have a
Hi,
I have a data type like this :
case class Data(col: String, ...)
and a Dataset[Data] ds. Some rows have columns filled with value 'a', and other
with value 'b', etc.
I want to process separately all data with a 'a', and all data with a 'b'. But
I also need to have all the 'a' in the sam
Hi Team,
Can somebody help?
Thanks,
Sid
On Sun, Jun 19, 2022 at 3:51 PM Sid wrote:
> Hi,
>
> I already have a partitioned JSON dataset in s3 like the below:
>
> edl_timestamp=2022090800
>
> Now, the problem is, in the earlier 10 days of data collection there was a
> duplicate columns issue