rformance by
> a little, but not enough to saturate the cluster resources.
>
> Did I miss some more tuning parameters that could help?
> One obvious thing would be to vertically increase the machines and use
> less nodes to minimize traffic, but 30 nodes doesn't seem like much even
> considering 30x30 connections.
>
> Thanks in advance!
>
>
--
Vladimir Prus
http://vladimirprus.com
at, where it is copied to
spark.sql.sources.outputCommitterClass,
and that option, in turn, is only used by SQLHadoopMapReduceCommitProtocol
- which we don't use here.
So, it sounds like setting parquet.output.committer.class to
org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter is no
longer necessary?
Or is there some code path where it matters?
--
Vladimir Prus
http://vladimirprus.com
gt; spark.hadoop.fs.s3a.connection.maximum config param from 200 to 400 or 900
>> but it didn't reduce the S3 latency.
>>
>> Do you have any idea for the cause of the read latency from S3?
>>
>> I saw this post
>> <https://aws.amazon.com/premiumsupport/knowledge-center/s3-transfer-data-bucket-instance/>
>> to
>> improve the transfer speed, is something here relevant?
>>
>>
>> Thanks,
>> Tzahi
>>
> --
Vladimir Prus
http://vladimirprus.com
1 6264277897 <+91+626+427+7897>
> [image: ThoughtWorks]
> <http://www.thoughtworks.com/?utm_campaign=prajwal-boloor-signature&utm_medium=email&utm_source=thoughtworks-email-signature-generator>
>
>
>
>
> On Tue, Feb 9, 2021 at 10:44 PM Vladimir Prus
> wrote
On 9 Feb 2021, at 19:46, Rishabh Jain wrote:
Hi,
We are trying to access S3 from spark job running on EKS cluster pod. I
have a service account that has an IAM role attached with full S3
permission. We are using DefaultCredentialsProviderChain. But still we are
getting 403 Forbidden from S3.
Hi,
If your data frame is partitioned by column A, and you want deduplication
by columns A, B and C, then a faster way might be to sort each partition by
A, B and C and then do a linear scan - it is often faster than group by all
columns - which require a shuffle. Sadly, there's no standard way to
d to V2, and whether these limitations above will be fixed.
Thanks in advance,
--
Vladimir Prus
http://vladimirprus.com