Hi ,

We are trying to load orc data (around 50 GB) on s3  from spark using
dataframe API. It starts fast with good write throughput  and then after
sometime throughput just drops and it gets stuck.

We also tried changing multiple configurations , but no luck
1. enabling checkpoint write throttling
2. disabling throttling and increasing checkpoint buffer

Please find below configuration and properties of the cluster

   1. 10 node cluster r4.4xl (EMR aws) and shared with spark
   2.  ignite is started with -Xms20g -Xmx30g
   3.  Cache mode is partitioned

   4. persistence is enabled
   5. DirectIO is enabled
   6. No backup

<property name=“dataStorageConfiguration”>
               <!-- Enable write throttling. -->
               <property name=“writeThrottlingEnabled” value=“false”/>
               <property name=“defaultDataRegionConfiguration”>
                       <property name=“persistenceEnabled” value=“true”/>
                       <property name=“checkpointPageBufferSize”
                   value=“#{20L * 1024 * 1024 * 1024}“/>
                       <property name=“name” value=“Default_Region”/>
                       <property name=“maxSize” value=“#{60L * 1024 * 1024
* 1024}“/>
               <property name=“walMode” value=“NONE”/>

Thanks in advance,

Rahul Aneja

Reply via email to