Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 @HyukjinKwon I understand we can have `py.test` and `doctest`, but I don't quite see how we could define the input DataFrame globally while at the same time have a clear, self-contained docstring for each function? @holdenk could you please elaborate on what you mean? If we want to repeat something like this in every docstring ``` >>> print(df.collect()) ``` we might as well simply include how to actually create the DataFrame so the user can easily reproduce the example? It seems to me that the user would often want to see the docstring to understand how a function works, and they may not be looking at some global documentation as a whole. And the fact that many of the input DataFrames are the same is really just a convenience for the doc writer and not a requirement. For instance this is the docstring for a numpy method (`numpy.argmax`), and the example is with the input clearly defined: ``` Examples -------- >>> a = np.arange(6).reshape(2,3) >>> a array([[0, 1, 2], [3, 4, 5]]) >>> np.argmax(a) 5 >>> np.argmax(a, axis=0) array([1, 1, 1]) >>> np.argmax(a, axis=1) array([2, 2]) ``` IMHO it seems odd to require the user to look at some global doc in order to follow the example usage for one single function
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org