Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18025
@felixcheung I just made a new commit which I think has the cleanest
solution so far. In this one, I implemented grouping for all aggregate
functions for Column, except those that are also defined for other classes
(`count`, `first` and `last`). As you can see, it achieves the following:
- Centralized documentation for easy navigation.
- Reduced number of items in `See also`
- Betters examples using shared data. This avoids creating a data frame for
each function if they are documented separately.
- Cleaner structure and much fewer Rd files.
- Remove duplicated definition of `@param`
- No need to write meaningless examples for trivial functions (because of
grouping).
In this version, I also demonstrate the for methods defined by multiple
classes (`count`, `first` and `last`), we can still document them on their own
RD, and simply give a link in the `SeeAlso` section. Of course, we can combine
the doc for these three to something like `shared_methods.Rd` since each of
them is tiny.
Also, to facilitate review, perhaps we can break the changes into several
PRs, one for each of `aggregate_functions`, `datetime_functions`,
`math_function`, and `misc_functions`?
After making the change to the Column methods, I will work on the doc for
SparkDataFrame and GroupedData.
Please let me know your thoughts.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]