srowen commented on a change in pull request #29320:
URL: https://github.com/apache/spark/pull/29320#discussion_r464008257



##########
File path: python/docs/source/index.rst
##########
@@ -21,8 +21,42 @@
 PySpark Documentation
 =====================
 
+PySpark is an interface for Apache Spark in Python language. It not only 
offers for you

Review comment:
       in the Python language, or just "in Python"
   

##########
File path: python/docs/source/index.rst
##########
@@ -21,8 +21,42 @@
 PySpark Documentation
 =====================
 
+PySpark is an interface for Apache Spark in Python language. It not only 
offers for you
+to write an application in the Python APIs but also provides PySpark shell so 
you can
+interactively analyze your data in a distributed environment. PySpark supports 
most
+of Spark features such as Spark SQL, DataFrmae, Streaming, MLlib
+(Machine Learning) and Spark Core.
+
+.. image:: ../../../docs/img/pyspark-components.png
+  :alt: PySpark Compoenents
+
+**Spark SQL and DataFrame**
+
+Spark SQL is a Spark module for structured data processing. It provides
+a programming abstraction called DataFrame and can also act as distributed
+SQL query engine.
+
+**Streaming**
+
+Running on top of Spark, the streaming feature in Apache Spark enables powerful
+interactive and analytical applications across both streaming and historical 
data,
+while inheriting Spark’s ease of use and fault tolerance characteristics.
+
+**MLlib**
+
+Built on top of Spark, MLlib is a scalable machine learning library that 
provides
+a uniform set of high-level APIs that help users create and tune practical 
machine
+learning pipelines.
+
+**Spark Core**
+
+Spark Core is the underlying general execution engine for the Spark platform 
that all
+other functionality is built on top of. It provides an RDD (Resilient 
Disributed Data)

Review comment:
       Data -> Dataset

##########
File path: python/docs/source/index.rst
##########
@@ -21,8 +21,42 @@
 PySpark Documentation
 =====================
 
+PySpark is an interface for Apache Spark in Python language. It not only 
offers for you
+to write an application in the Python APIs but also provides PySpark shell so 
you can
+interactively analyze your data in a distributed environment. PySpark supports 
most
+of Spark features such as Spark SQL, DataFrmae, Streaming, MLlib

Review comment:
       most of Spark's features

##########
File path: python/docs/source/index.rst
##########
@@ -21,8 +21,42 @@
 PySpark Documentation
 =====================
 
+PySpark is an interface for Apache Spark in Python language. It not only 
offers for you
+to write an application in the Python APIs but also provides PySpark shell so 
you can

Review comment:
       Maybe "It not only allows you to write Spark applications using Python 
APIs, but also provides the PySpark shell for interactively analyzing ..."




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to