Re: Check if shuffle is caused for repartitioned pyspark dataframes

2022-12-26 Thread Shivam Verma
you sort by the partition key to > ensure the partitions match up? Thinking of olden times :) > > On Fri, Dec 23, 2022 at 4:42 AM Shivam Verma > wrote: > >> Hi Gurunandan, >> >> Thanks for the reply! >> >> I do see the exchange operator in the SQL tab, bu

Re: Check if shuffle is caused for repartitioned pyspark dataframes

2022-12-23 Thread Shivam Verma
the executed plan and validate for Exchange > Operator in the Physical Plan. > > On Wed, Dec 14, 2022 at 10:56 AM Shivam Verma > wrote: > > > > Hello folks, > > > > I have a use case where I save two pyspark dataframes as parquet files > and then use them lat

Check if shuffle is caused for repartitioned pyspark dataframes

2022-12-13 Thread Shivam Verma
Hello folks, I have a use case where I save two pyspark dataframes as parquet files and then use them later to join with each other or with other tables and perform multiple aggregations. Since I know the column being used in the downstream joins and groupby, I was hoping I could use