Github user arucard21 commented on the issue:
https://github.com/apache/spark/pull/20731
I could try to create a simplified version of this image with just a few of
these relations. But since that's still an image it would still be hard to
update.
So I can remove the image but try to describe the context a bit more in
text. We already mention the possibility of using Hadoop modules for Cluster
Management and Distributed Storage. We can include how there are other options
available to provide this functionality. That should provide a more
comprehensive overview of the important entities that Spark interacts with.
(available APIs + modules to extend functionality + third-party modules for
specific functionality = all you need to run Spark)
I'm not sure if this is sufficient added value for the documentation (or if
you think this additional information is even needed). If not, let me know and
I can just close this PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]