Github user mortada commented on the issue:

    https://github.com/apache/spark/pull/15053
  
    @HyukjinKwon I understand we can have `py.test` and `doctest`, but I don't 
quite see how we could define the input DataFrame globally while at the same 
time have a clear, self-contained docstring for each function?
    
    @holdenk could you please elaborate on what you mean? 
    
    If we want to repeat something like this in every docstring
    ```
    >>> print(df.collect())
    ```
    we might as well simply include how to actually create the DataFrame so the 
user can easily reproduce the example?
    
    It seems to me that the user would often want to see the docstring to 
understand how a function works, and they may not be looking at some global 
documentation as a whole. And the fact that many of the input DataFrames are 
the same is really just a convenience for the doc writer and not a requirement.
    
    For instance this is the docstring for a numpy method (`numpy.argmax`), and 
the example is with the input clearly defined:
    ```
    Examples
    --------
    >>> a = np.arange(6).reshape(2,3)
    >>> a
    array([[0, 1, 2],
           [3, 4, 5]])
    >>> np.argmax(a)
    5
    >>> np.argmax(a, axis=0)
    array([1, 1, 1])
    >>> np.argmax(a, axis=1)
    array([2, 2])
    ```
    
    IMHO it seems odd to require the user to look at some global doc in order 
to follow the example usage for one single function


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to