unsubscribe

2021-10-27 Thread Sainath Palla
unsubscribe


Apache Spark 3.2.0 Voluntary Product Assessment Template (VPAT)

2021-10-27 Thread Benjamin Murphy - IQGA-C
To whom it may concern,

Your products, Apache Spark 3.2.0 and Apache Scala 3.0, are candidate
technology for use within the U.S. General Services Administration (GSA)
enterprise environment.  Technologies under review by GSA’s Office of the
Chief Technology Officer (OCTO) must be accompanied by a completed
Voluntary Product Assessment Template (VPAT).   The template and additional
Section 508 program information may be found at:
https://www.section508.gov/sell/vpat.  Once the VPAT is completed, please
return it in PDF or .doc format to the contact information provided below.

If this artifact is available via the web, please provide a download link.
In the event that your organization declines to complete the VPAT/ACR,
please notify me at the contact information provided below.

Very Respectfully,
Ben and team!

--

BENJAMIN MURPHY | *Technical Project Manager / Scrum Master*

GSA COMET Hale Bopp Contract
Mobile:  703-888-9722 |  Email: benjamin.mur...@gsa.gov


Shuffle in Spark with Kubernetes

2021-10-27 Thread Mich Talebzadeh
As I understand Spark releases > 3 currently do not support external
shuffle. Is there any timelines when this could be available?

For now we have two parameters for Dynamic Resource Allocation. These are

 --conf spark.dynamicAllocation.enabled=true \
 --conf spark.dynamicAllocation.shuffleTracking.enabled=true \


The idea is to use dynamic resource allocation where the driver tracks the
shuffle files and evicts only executors not storing active shuffle files.
So in a nutshell these shuffle files are stored in the executors themselves
in the absence of the external shuffle. The model works on the basis
of the "one-container-per-Pod"
model   meaning that
for each node of the cluster there will be one node running the driver and
each remaining node running one executor each. If I over-provision my GKE
cluster, for example adding one redundant node and increasing the number of
executors by one it should improve the latency. Has there been any
benchmarks on this feature?


Thanks



   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.