Re: Usage of DropDuplicate in Spark

2021-06-22 Thread Chetan Khatri
this has been very slow On Tue, Jun 22, 2021 at 1:15 PM Sachit Murarka wrote: > Hi Chetan, > > You can substract the data frame or use except operation. > First DF contains full rows. > Second DF contains unique rows (post remove duplicates) > Subtract first and second DF . > > hope this helps

Re: Usage of DropDuplicate in Spark

2021-06-22 Thread Sachit Murarka
Hi Chetan, You can substract the data frame or use except operation. First DF contains full rows. Second DF contains unique rows (post remove duplicates) Subtract first and second DF . hope this helps Thanks Sachit On Tue, Jun 22, 2021, 22:23 Chetan Khatri wrote: > Hi Spark Users, > > I want

Usage of DropDuplicate in Spark

2021-06-22 Thread Chetan Khatri
Hi Spark Users, I want to use DropDuplicate, but those records which I discard. I would like to log to the instrumental table. What would be the best approach to do that? Thanks

Re: Usage of DropDuplicate in Spark

2021-06-22 Thread Chetan Khatri
I am looking for any built-in API if at all exists? On Tue, Jun 22, 2021 at 1:16 PM Chetan Khatri wrote: > this has been very slow > > On Tue, Jun 22, 2021 at 1:15 PM Sachit Murarka > wrote: > >> Hi Chetan, >> >> You can substract the data frame or use except operation. >> First DF contains

Performance Problems Migrating to S3A Committers

2021-06-22 Thread Johnny Burns
Hello. I’m Johnny, I work at Stripe. We’re heavy Spark users and we’ve been exploring using s3 committers. Currently we first write the data to HDFS and then upload it to S3. However, now with S3 offering strong consistency guarantees, we are evaluating if we can write data directly to S3. We’re

Any Other Options other than Spark IN Query

2021-06-22 Thread ranju goel
Hi All, Please suggest what are the other possible options in Spark other than IN Queries for fetching the data from db. If I am executing IN Query , all data fetched to single executor in single partition and load does not distribute to other executors. Please suggest are there other