Re: [PR] [SPARK-45861][PYTHON][DOCS] Add user guide for dataframe creation [spark]

via GitHub Wed, 29 Nov 2023 13:06:29 -0800


allisonwang-db commented on code in PR #43897:
URL: https://github.com/apache/spark/pull/43897#discussion_r1409849131



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation

Review Comment:
   ```suggestion
   Creating DataFrames in PySpark
   ```



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples

Review Comment:
   Do we need to show this example? Which one is better? From lists or tuples? 
We should provide opinionated ways to create DataFrames.



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.

Review Comment:
   Personally, I think we don't need this section. We can directly dive into 
different ways to create data frames and add some explanations there.



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+

Review Comment:
   ```suggestion
   PySpark allows you to create DataFrames in several ways. Let's explore these 
methods with simple examples.
   ```



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists

Review Comment:
   ```suggestion
   Creating a :class:`DataFrame` from Lists
   ```



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql.types import *
+    >>> schema = StructType([StructField("name", StringType(), True),
+    ...     StructField("age", IntegerType(), True)])

Review Comment:
   ```suggestion
   schema = StructType([
       StructField("name", StringType(), True),
       StructField("age", IntegerType(), True)
   ])
   ```



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql.types import *
+    >>> schema = StructType([StructField("name", StringType(), True),
+    ...     StructField("age", IntegerType(), True)])
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema)
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit DDL-formatted string 
schema specified

Review Comment:
   Let's combine this with the previous section



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+

Review Comment:
   ```suggestion
   Define a schema and use it to create a DataFrame. A schema describes the 
column names and types.
   ```



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit schema specified

Review Comment:
   ```suggestion
   Creating a :class:`DataFrame` with a Specified Schema
   ```



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|

Review Comment:
   Maybe we should highlight that when the schema is not provided, the 
resulting data frame has `_1` and `_2` as the schema  (this differs from pandas 
for example)



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql.types import *
+    >>> schema = StructType([StructField("name", StringType(), True),
+    ...     StructField("age", IntegerType(), True)])
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema)
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit DDL-formatted string 
schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema = "name 
string, age int")
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of dictionaries
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([{'name': 'Alice', 'age': 1}])
+    >>> df.show()
+    +---+-----+
+    |age| name|
+    +---+-----+
+    |  1|Alice|
+    +---+-----+
+
+
+Creating a PySpark :class:`DataFrame` from a list of :class:`Row`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+

Review Comment:
   
   ```suggestion
   Use the Row type to define rows of a DataFrame.
   
   ```



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql.types import *
+    >>> schema = StructType([StructField("name", StringType(), True),
+    ...     StructField("age", IntegerType(), True)])
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema)
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit DDL-formatted string 
schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema = "name 
string, age int")
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of dictionaries
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([{'name': 'Alice', 'age': 1}])
+    >>> df.show()
+    +---+-----+
+    |age| name|
+    +---+-----+
+    |  1|Alice|
+    +---+-----+
+
+
+Creating a PySpark :class:`DataFrame` from a list of :class:`Row`

Review Comment:
   ```suggestion
   Creating a :class:`DataFrame` from :class:`Row`s
   ```



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql.types import *
+    >>> schema = StructType([StructField("name", StringType(), True),
+    ...     StructField("age", IntegerType(), True)])
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema)
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit DDL-formatted string 
schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema = "name 
string, age int")
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of dictionaries
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([{'name': 'Alice', 'age': 1}])
+    >>> df.show()
+    +---+-----+
+    |age| name|
+    +---+-----+
+    |  1|Alice|
+    +---+-----+
+
+
+Creating a PySpark :class:`DataFrame` from a list of :class:`Row`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql import Row
+    >>> Person = Row('name', 'age')
+    >>> df = spark.createDataFrame([Person("Alice", 1), Person("Bob", 5)])
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a :class:`pandas.DataFrame`

Review Comment:
   ```suggestion
   Creating a :class:`DataFrame` from a :class:`pandas.DataFrame` or a 
:class:`numpy.ndarray`
   ```



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql.types import *
+    >>> schema = StructType([StructField("name", StringType(), True),
+    ...     StructField("age", IntegerType(), True)])
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema)
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit DDL-formatted string 
schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema = "name 
string, age int")
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of dictionaries
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([{'name': 'Alice', 'age': 1}])
+    >>> df.show()
+    +---+-----+
+    |age| name|
+    +---+-----+
+    |  1|Alice|
+    +---+-----+
+
+
+Creating a PySpark :class:`DataFrame` from a list of :class:`Row`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql import Row
+    >>> Person = Row('name', 'age')
+    >>> df = spark.createDataFrame([Person("Alice", 1), Person("Bob", 5)])
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a :class:`pandas.DataFrame`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> import pandas as pd
+    >>> df = spark.createDataFrame(pd.DataFrame([[1, 2]]))
+    >>> df.show()
+    +---+---+
+    |  0|  1|
+    +---+---+
+    |  1|  2|
+    +---+---+
+
+
+Creating a PySpark :class:`DataFrame` from a :class:`numpy.ndarray`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> import numpy as np
+    >>> import pandas as pd
+    >>> df = spark.createDataFrame(pd.DataFrame(data=np.array([[1, 2], [3, 
4]]),
+    ...     columns=['a', 'b']))
+    >>> df.show()
+    +---+---+
+    |  a|  b|
+    +---+---+
+    |  1|  2|
+    |  3|  4|
+    +---+---+
+
+
+Creating through `read.format(...).load(...)`
+---------------------------------------------
+
+Creating a PySpark :class:`DataFrame` by reading existing **json** format file 
data

Review Comment:
   Here we can combine all sections to show examples:
   ```
   - Example with JSON 
   <code block>
   - Example with CSV
   <code block>
   - Example with Parquet
   <code block>
   - Example with JDBC
   <code block>
   ```



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql.types import *
+    >>> schema = StructType([StructField("name", StringType(), True),
+    ...     StructField("age", IntegerType(), True)])
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema)
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit DDL-formatted string 
schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema = "name 
string, age int")
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of dictionaries
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([{'name': 'Alice', 'age': 1}])
+    >>> df.show()
+    +---+-----+
+    |age| name|
+    +---+-----+
+    |  1|Alice|
+    +---+-----+
+
+
+Creating a PySpark :class:`DataFrame` from a list of :class:`Row`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql import Row
+    >>> Person = Row('name', 'age')
+    >>> df = spark.createDataFrame([Person("Alice", 1), Person("Bob", 5)])
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a :class:`pandas.DataFrame`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> import pandas as pd
+    >>> df = spark.createDataFrame(pd.DataFrame([[1, 2]]))
+    >>> df.show()
+    +---+---+
+    |  0|  1|
+    +---+---+
+    |  1|  2|
+    +---+---+
+
+
+Creating a PySpark :class:`DataFrame` from a :class:`numpy.ndarray`

Review Comment:
   We can combine this with the previous section.



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql.types import *
+    >>> schema = StructType([StructField("name", StringType(), True),
+    ...     StructField("age", IntegerType(), True)])
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema)
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit DDL-formatted string 
schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema = "name 
string, age int")
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of dictionaries
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([{'name': 'Alice', 'age': 1}])
+    >>> df.show()
+    +---+-----+
+    |age| name|
+    +---+-----+
+    |  1|Alice|
+    +---+-----+
+
+
+Creating a PySpark :class:`DataFrame` from a list of :class:`Row`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql import Row
+    >>> Person = Row('name', 'age')
+    >>> df = spark.createDataFrame([Person("Alice", 1), Person("Bob", 5)])
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a :class:`pandas.DataFrame`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> import pandas as pd
+    >>> df = spark.createDataFrame(pd.DataFrame([[1, 2]]))
+    >>> df.show()
+    +---+---+
+    |  0|  1|
+    +---+---+
+    |  1|  2|
+    +---+---+
+
+
+Creating a PySpark :class:`DataFrame` from a :class:`numpy.ndarray`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> import numpy as np
+    >>> import pandas as pd
+    >>> df = spark.createDataFrame(pd.DataFrame(data=np.array([[1, 2], [3, 
4]]),
+    ...     columns=['a', 'b']))
+    >>> df.show()
+    +---+---+
+    |  a|  b|
+    +---+---+
+    |  1|  2|
+    |  3|  4|
+    +---+---+
+
+
+Creating through `read.format(...).load(...)`

Review Comment:
   Reading Data from Files



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql.types import *
+    >>> schema = StructType([StructField("name", StringType(), True),
+    ...     StructField("age", IntegerType(), True)])
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema)
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit DDL-formatted string 
schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema = "name 
string, age int")
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of dictionaries
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+

Review Comment:
   ```suggestion
   Dictionaries with keys as column names can also be used.
   
   ```



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql.types import *

Review Comment:
   Let's not use `import *`
   ```
   from pyspark.sql.types import StructType, StructField, StringType, 
IntegerType
   ```



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql.types import *
+    >>> schema = StructType([StructField("name", StringType(), True),
+    ...     StructField("age", IntegerType(), True)])
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema)
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit DDL-formatted string 
schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema = "name 
string, age int")

Review Comment:
   "name string, age int" Just curious, do we have any documentation on this 
DDL string format? How to translate a pyspark type into this DDL string format?



##########
python/docs/source/user_guide/sql/dataframe_creation.rst:
##########
@@ -0,0 +1,239 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+==================
+DataFrame Creation
+==================
+
+.. currentmodule:: pyspark.sql
+
+Creating through `createDataFrame`
+----------------------------------
+
+A PySpark :class:`DataFrame` can be created via 
:meth:`SparkSession.createDataFrame` typically by passing
+a list of lists, tuples, dictionaries and :class:`Row`, a pandas 
:class:`pandas.DataFrame`,
+a NumPy :class:`numpy.ndarray` and an :class:`pyspark.RDD`.
+:meth:`SparkSession.createDataFrame` takes the `schema` argument to specify 
the schema of the :class:`DataFrame`.
+When it is omitted, PySpark infers the corresponding schema by taking a sample 
from the data.
+
+Creating a PySpark :class:`DataFrame` from a list of lists
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([['Alice', 1], ['Bob', 5]])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of tuples
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)])
+    >>> df.show()
+    +-----+---+
+    |   _1| _2|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> from pyspark.sql.types import *
+    >>> schema = StructType([StructField("name", StringType(), True),
+    ...     StructField("age", IntegerType(), True)])
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema)
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` with the explicit DDL-formatted string 
schema specified
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    >>> df = spark.createDataFrame([('Alice', 1), ('Bob', 5)], schema = "name 
string, age int")
+    >>> df.show()
+    +-----+---+
+    | name|age|
+    +-----+---+
+    |Alice|  1|
+    |  Bob|  5|
+    +-----+---+
+
+
+Creating a PySpark :class:`DataFrame` from a list of dictionaries

Review Comment:
   ```suggestion
   Creating a :class:`DataFrame` from Dictionaries
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-45861][PYTHON][DOCS] Add user guide for dataframe creation [spark]

Reply via email to