Github user zero323 commented on the issue: https://github.com/apache/spark/pull/16537 Putting this particular PR and the scalability of the improvement process aside, Spark is heavily underdocumented. This is something that hits Python and R users way more than everyone else. In the worst case scenario when working with Scala you can just follow the types. It wouldn't be a problem if used consistent conventions, idiomatic Python and didn't make hidden assumptions once in a while :) Take things like `DataFrame.replace` or some parts of the `DStream` API (can you point out places when expect `function` not a `Callable`) for example. I am not really a stakeholder here but i really believe that small things like this are crucial to create decent user experience. In hindsight, I overdid with the number of tasks but on my defense I am pretty sure that at least some of these won't get merged. Moreover problem is not imaginary. For many users it is not obvious how to use udfs (http://stackoverflow.com/q/35546576, http://stackoverflow.com/q/35375255, http://stackoverflow.com/q/39254503) and docstring of `udf` is actually the only documented example I found.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org