Hello.
I’m Johnny, I work at Stripe. We’re heavy Spark users and we’ve been
exploring using s3 committers. Currently we first write the data to HDFS
and then upload it to S3. However, now with S3 offering strong consistency
guarantees, we are evaluating if we can write data directly to S3.
We’re
I am looking for any built-in API if at all exists?
On Tue, Jun 22, 2021 at 1:16 PM Chetan Khatri
wrote:
> this has been very slow
>
> On Tue, Jun 22, 2021 at 1:15 PM Sachit Murarka
> wrote:
>
>> Hi Chetan,
>>
>> You can substract the data frame or use except operation.
>> First DF contains ful
this has been very slow
On Tue, Jun 22, 2021 at 1:15 PM Sachit Murarka
wrote:
> Hi Chetan,
>
> You can substract the data frame or use except operation.
> First DF contains full rows.
> Second DF contains unique rows (post remove duplicates)
> Subtract first and second DF .
>
> hope this helps
>
Hi Chetan,
You can substract the data frame or use except operation.
First DF contains full rows.
Second DF contains unique rows (post remove duplicates)
Subtract first and second DF .
hope this helps
Thanks
Sachit
On Tue, Jun 22, 2021, 22:23 Chetan Khatri
wrote:
> Hi Spark Users,
>
> I want
Hi Spark Users,
I want to use DropDuplicate, but those records which I discard. I
would like to log to the instrumental table.
What would be the best approach to do that?
Thanks
Hi All,
Please suggest what are the other possible options in Spark other than IN
Queries for fetching the data from db.
If I am executing IN Query , all data fetched to single executor in single
partition and load does not distribute to other executors.
Please suggest are there other possibilit