date:20210622

Performance Problems Migrating to S3A Committers

2021-06-22 Thread Johnny Burns

Hello. I’m Johnny, I work at Stripe. We’re heavy Spark users and we’ve been exploring using s3 committers. Currently we first write the data to HDFS and then upload it to S3. However, now with S3 offering strong consistency guarantees, we are evaluating if we can write data directly to S3. We’re

Re: Usage of DropDuplicate in Spark

2021-06-22 Thread Chetan Khatri

I am looking for any built-in API if at all exists? On Tue, Jun 22, 2021 at 1:16 PM Chetan Khatri wrote: > this has been very slow > > On Tue, Jun 22, 2021 at 1:15 PM Sachit Murarka > wrote: > >> Hi Chetan, >> >> You can substract the data frame or use except operation. >> First DF contains ful

Re: Usage of DropDuplicate in Spark

2021-06-22 Thread Chetan Khatri

this has been very slow On Tue, Jun 22, 2021 at 1:15 PM Sachit Murarka wrote: > Hi Chetan, > > You can substract the data frame or use except operation. > First DF contains full rows. > Second DF contains unique rows (post remove duplicates) > Subtract first and second DF . > > hope this helps >

Re: Usage of DropDuplicate in Spark

2021-06-22 Thread Sachit Murarka

Hi Chetan, You can substract the data frame or use except operation. First DF contains full rows. Second DF contains unique rows (post remove duplicates) Subtract first and second DF . hope this helps Thanks Sachit On Tue, Jun 22, 2021, 22:23 Chetan Khatri wrote: > Hi Spark Users, > > I want

Usage of DropDuplicate in Spark

2021-06-22 Thread Chetan Khatri

Hi Spark Users, I want to use DropDuplicate, but those records which I discard. I would like to log to the instrumental table. What would be the best approach to do that? Thanks

Any Other Options other than Spark IN Query

2021-06-22 Thread ranju goel

Hi All, Please suggest what are the other possible options in Spark other than IN Queries for fetching the data from db. If I am executing IN Query , all data fetched to single executor in single partition and load does not distribute to other executors. Please suggest are there other possibilit

Performance Problems Migrating to S3A Committers

Re: Usage of DropDuplicate in Spark

Re: Usage of DropDuplicate in Spark

Re: Usage of DropDuplicate in Spark

Usage of DropDuplicate in Spark

Any Other Options other than Spark IN Query

6 matches

Site Navigation

Mail list logo

Footer information